Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-11-01 Thread Reynold Xin
Thanks for reporting it, Sjoerd. You might have a different version of
Janino brought in from somewhere else.

This should fix your problem: https://github.com/apache/spark/pull/9372

Can you give it a try?



On Tue, Oct 27, 2015 at 9:12 PM, Sjoerd Mulder 
wrote:

> No the job actually doesn't fail, but since our tests is generating all
> these stacktraces i have disabled the tungsten mode just to be sure (and
> don't have gazilion stacktraces in production).
>
> 2015-10-27 20:59 GMT+01:00 Josh Rosen :
>
>> Hi Sjoerd,
>>
>> Did your job actually *fail* or did it just generate many spurious
>> exceptions? While the stacktrace that you posted does indicate a bug, I
>> don't think that it should have stopped query execution because Spark
>> should have fallen back to an interpreted code path (note the "Failed to
>> generate ordering, fallback to interpreted" in the error message).
>>
>> On Tue, Oct 27, 2015 at 12:56 PM Sjoerd Mulder 
>> wrote:
>>
>>> I have disabled it because of it started generating ERROR's when
>>> upgrading from Spark 1.4 to 1.5.1
>>>
>>> 2015-10-27T20:50:11.574+0100 ERROR TungstenSort.newOrdering() - Failed
>>> to generate ordering, fallback to interpreted
>>> java.util.concurrent.ExecutionException: java.lang.Exception: failed to
>>> compile: org.codehaus.commons.compiler.CompileException: Line 15, Column 9:
>>> Invalid character input "@" (character code 64)
>>>
>>> public SpecificOrdering
>>> generate(org.apache.spark.sql.catalyst.expressions.Expression[] expr) {
>>>   return new SpecificOrdering(expr);
>>> }
>>>
>>> class SpecificOrdering extends
>>> org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering {
>>>
>>>   private org.apache.spark.sql.catalyst.expressions.Expression[]
>>> expressions;
>>>
>>>
>>>
>>>   public
>>> SpecificOrdering(org.apache.spark.sql.catalyst.expressions.Expression[]
>>> expr) {
>>> expressions = expr;
>>>
>>>   }
>>>
>>>   @Override
>>>   public int compare(InternalRow a, InternalRow b) {
>>> InternalRow i = null;  // Holds current row being evaluated.
>>>
>>> i = a;
>>> boolean isNullA2;
>>> long primitiveA3;
>>> {
>>>   /* input[2, LongType] */
>>>
>>>   boolean isNull0 = i.isNullAt(2);
>>>   long primitive1 = isNull0 ? -1L : (i.getLong(2));
>>>
>>>   isNullA2 = isNull0;
>>>   primitiveA3 = primitive1;
>>> }
>>> i = b;
>>> boolean isNullB4;
>>> long primitiveB5;
>>> {
>>>   /* input[2, LongType] */
>>>
>>>   boolean isNull0 = i.isNullAt(2);
>>>   long primitive1 = isNull0 ? -1L : (i.getLong(2));
>>>
>>>   isNullB4 = isNull0;
>>>   primitiveB5 = primitive1;
>>> }
>>> if (isNullA2 && isNullB4) {
>>>   // Nothing
>>> } else if (isNullA2) {
>>>   return 1;
>>> } else if (isNullB4) {
>>>   return -1;
>>> } else {
>>>   int comp = (primitiveA3 > primitiveB5 ? 1 : primitiveA3 <
>>> primitiveB5 ? -1 : 0);
>>>   if (comp != 0) {
>>> return -comp;
>>>   }
>>> }
>>>
>>> return 0;
>>>   }
>>> }
>>>
>>> at
>>> org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>>> at
>>> org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>>> at
>>> org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>>> at
>>> org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>>> at
>>> org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>>> at
>>> org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
>>> at
>>> org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>>> at
>>> org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>>> at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
>>> at
>>> org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>>> at
>>> org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.compile(CodeGenerator.scala:362)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:139)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:37)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:425)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:422)
>>> at
>>> org.apache.spark.sql.execution.SparkPlan.newOrdering(SparkPlan.scala:294)
>>> at org.apache.spark.sql.execution.TungstenSort.org
>>> $apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:131)
>>> at
>>> 

unscribe

2015-11-01 Thread Chenxi Li
unscribe


Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Steve Loughran

On 1 Nov 2015, at 03:17, Nicholas Chammas 
> wrote:

https://s3.amazonaws.com/spark-related-packages/

spark-ec2 uses this bucket to download and install HDFS on clusters. Is it 
owned by the Spark project or by the AMPLab?

Anyway, it looks like the latest Hadoop install available on there is Hadoop 
2.4.0.

Are there plans to add newer versions of Hadoop for use by spark-ec2 and 
similar tools, or should we just be getting that stuff via an Apache 
mirror? The latest version is 2.7.1, by 
the way.


you should be grabbing the artifacts off the ASF and then verifying their SHA1 
checksums as published on the ASF HTTPS web site


The problem with the Apache mirrors, if I am not mistaken, is that you cannot 
use a single URL that automatically redirects you to a working mirror to 
download Hadoop. You have to pick a specific mirror and pray it doesn't 
disappear tomorrow.


They don't go away, especially http://mirror.ox.ac.uk , and in the us the 
apache.osuosl.org, osu being a where a lot of the ASF 
servers are kept.

full list with availability stats

http://www.apache.org/mirrors/




Re: Spark 1.6 Release Schedule

2015-11-01 Thread Sean Owen
I like the idea, but I think there's already a lot of triage backlog. Can
we more concretely address this now and during the next two weeks?

1.6.0 stats from JIRA:

344 issues targeted at 1.6.0, of which
  253 are from committers, of which
215 are improvements/other, of which
   5 are blockers
38 are bugs, of which
   4 are blockers
  11 are critical

Tip: It's really easy to manage saved queries for this and other things
with the free JIRA Client (http://almworks.com/jiraclient/overview.html)
that now works with Java 8.

It still looks like a lot for a point where 1.6.0 is supposed to be being
tested in theory. Lots of (most?) things that were said to be done for
1.6.0 for several months aren't going to be, and that still surprises me as
a software development practice.

Well, life is busy and chaotic out here in OSS land. I'd still like to push
even more on lightweight triage and release planning, centering around
Target Version, if only to make visible what's happening with intention and
reality:

1. Any JIRAs that seem to have been targeted at 1.6.0 by a non-committer
are untargeted, as they shouldn't be to begin with

2. This week, maintainers and interested parties review all JIRAs targeted
at 1.6.0 and untarget/retarget accordingly

3. Start of next week (the final days before an RC), non-Blocker non-bugs
untargeted, or in a few cases pushed to 1.6.1 or beyond

4. After next week, non-Blocker and non-Critical bugs are pushed, as the RC
is then late.

5. No release candidate until no Blockers are open.

6. (Repeat 1 and 2 more regularly through the development period for 1.7
instead of at the end.)

On Sat, Oct 31, 2015 at 11:25 AM, Michael Armbrust 
wrote:

> Hey All,
>
> Just a friendly reminder that today (October 31st) is the scheduled code
> freeze for Spark 1.6.  Since a lot of developers were busy with the Spark
> Summit last week I'm going to delay cutting the branch until Monday,
> November 2nd.  After that point, we'll package a release for testing and
> then go into the normal triage process where bugs are prioritized and some
> smaller features are allowed in on a case by case basis (if they are very
> low risk/additive/feature flagged/etc).
>
> As a reminder, release window dates are always maintained on the wiki and
> are updated after each release according to our 3 month release cadence:
>
> https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage
>
> Thanks!
>
> Michael
>
>


Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
I think that getting them from the ASF mirrors is a better strategy in
general as it'll remove the overhead of keeping the S3 bucket up to
date. It works in the spark-ec2 case because we only support a limited
number of Hadoop versions from the tool. FWIW I don't have write
access to the bucket and also haven't heard of any plans to support
newer versions in spark-ec2.

Thanks
Shivaram

On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran  wrote:
>
> On 1 Nov 2015, at 03:17, Nicholas Chammas 
> wrote:
>
> https://s3.amazonaws.com/spark-related-packages/
>
> spark-ec2 uses this bucket to download and install HDFS on clusters. Is it
> owned by the Spark project or by the AMPLab?
>
> Anyway, it looks like the latest Hadoop install available on there is Hadoop
> 2.4.0.
>
> Are there plans to add newer versions of Hadoop for use by spark-ec2 and
> similar tools, or should we just be getting that stuff via an Apache mirror?
> The latest version is 2.7.1, by the way.
>
>
> you should be grabbing the artifacts off the ASF and then verifying their
> SHA1 checksums as published on the ASF HTTPS web site
>
>
> The problem with the Apache mirrors, if I am not mistaken, is that you
> cannot use a single URL that automatically redirects you to a working mirror
> to download Hadoop. You have to pick a specific mirror and pray it doesn't
> disappear tomorrow.
>
>
> They don't go away, especially http://mirror.ox.ac.uk , and in the us the
> apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
>
> full list with availability stats
>
> http://www.apache.org/mirrors/
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-01 Thread Romi Kuntsman
[adding dev list since it's probably a bug, but i'm not sure how to
reproduce so I can open a bug about it]

Hi,

I have a standalone Spark 1.4.0 cluster with 100s of applications running
every day.

>From time to time, the applications crash with the following error (see
below)
But at the same time (and also after that), other applications are running,
so I can safely assume the master and workers are working.

1. why is there a NullPointerException? (i can't track the scala stack
trace to the code, but anyway NPE is usually a obvious bug even if there's
actually a network error...)
2. why can't it connect to the master? (if it's a network timeout, how to
increase it? i see the values are hardcoded inside AppClient)
3. how to recover from this error?


  ERROR 01-11 15:32:54,991SparkDeploySchedulerBackend - Application has
been killed. Reason: All masters are unresponsive! Giving up. ERROR
  ERROR 01-11 15:32:55,087  OneForOneStrategy - ERROR
logs/error.log
  java.lang.NullPointerException NullPointerException
  at
org.apache.spark.deploy.client.AppClient$ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(AppClient.scala:160)
  at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
  at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
  at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
  at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
  at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
  at
org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
  at
org.apache.spark.deploy.client.AppClient$ClientActor.aroundReceive(AppClient.scala:61)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.ActorCell.invoke(ActorCell.scala:487)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
  at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
  ERROR 01-11 15:32:55,603   SparkContext - Error
initializing SparkContext. ERROR
  java.lang.IllegalStateException: Cannot call methods on a stopped
SparkContext
  at org.apache.spark.SparkContext.org
$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
  at
org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1501)
  at
org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2005)
  at org.apache.spark.SparkContext.(SparkContext.scala:543)
  at
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)


Thanks!

*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com


Re: unscribe

2015-11-01 Thread Ted Yu
Please take a look at first section of spark.apache.org/community

FYI

On Sun, Nov 1, 2015 at 1:09 AM, Chenxi Li  wrote:

> unscribe
>


Re: [Spark MLlib] about linear regression issue

2015-11-01 Thread DB Tsai
For the constrains like all weights >=0, people do LBFGS-B which is
supported in our optimization library, Breeze.
https://github.com/scalanlp/breeze/issues/323

However, in Spark's LiR, our implementation doesn't have constrain
implementation. I do see this is useful given we're experimenting
SLIM: Sparse Linear Methods for recommendation,
http://www-users.cs.umn.edu/~xning/papers/Ning2011c.pdf which requires
all the weights to be positive (Eq. 3) to represent positive relations
between items.

In summary, it's possible and not difficult to add this constrain to
our current linear regression, but currently, there is no open source
implementation in Spark.

Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Sun, Nov 1, 2015 at 9:22 AM, Zhiliang Zhu  wrote:
> Dear All,
>
> As for N dimension linear regression, while the labeled training points
> number (or the rank of the labeled point space) is less than N,
> then from perspective of math, the weight of the trained linear model may be
> not unique.
>
> However, the output of model.weight() by spark may be with some wi < 0. My
> issue is, is there some proper way only to get
> some specific output weight with all wi >= 0 ...
>
> Yes, the above goes same with the issue about solving linear system of
> equations, Aw = b, and r(A, b) = r(A) < columnNo(A), then w is
> with infinite solutions, but here only needs one solution with all wi >= 0.
> When there is only unique solution, both LR and SVD will work perfect.
>
> I will appreciate your all kind help very much~~
> Best Regards,
> Zhiliang
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Nicholas Chammas
Oh, sweet! For example:

http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz?asjson=1

Thanks for sharing that tip. Looks like you can also use as_json

(vs. asjson).

Nick
​

On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
>  wrote:
> > OK, I’ll focus on the Apache mirrors going forward.
> >
> > The problem with the Apache mirrors, if I am not mistaken, is that you
> > cannot use a single URL that automatically redirects you to a working
> mirror
> > to download Hadoop. You have to pick a specific mirror and pray it
> doesn’t
> > disappear tomorrow.
> >
> > They don’t go away, especially http://mirror.ox.ac.uk , and in the us
> the
> > apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
> >
> > So does Apache offer no way to query a URL and automatically get the
> closest
> > working mirror? If I’m installing HDFS onto servers in various EC2
> regions,
> > the best mirror will vary depending on my location.
> >
> Not sure if this is officially documented somewhere but if you pass
> '=1' you will get back a JSON which has a 'preferred' field set
> to the closest mirror.
>
> Shivaram
> > Nick
> >
> >
> > On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
> >  wrote:
> >>
> >> I think that getting them from the ASF mirrors is a better strategy in
> >> general as it'll remove the overhead of keeping the S3 bucket up to
> >> date. It works in the spark-ec2 case because we only support a limited
> >> number of Hadoop versions from the tool. FWIW I don't have write
> >> access to the bucket and also haven't heard of any plans to support
> >> newer versions in spark-ec2.
> >>
> >> Thanks
> >> Shivaram
> >>
> >> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran 
> >> wrote:
> >> >
> >> > On 1 Nov 2015, at 03:17, Nicholas Chammas  >
> >> > wrote:
> >> >
> >> > https://s3.amazonaws.com/spark-related-packages/
> >> >
> >> > spark-ec2 uses this bucket to download and install HDFS on clusters.
> Is
> >> > it
> >> > owned by the Spark project or by the AMPLab?
> >> >
> >> > Anyway, it looks like the latest Hadoop install available on there is
> >> > Hadoop
> >> > 2.4.0.
> >> >
> >> > Are there plans to add newer versions of Hadoop for use by spark-ec2
> and
> >> > similar tools, or should we just be getting that stuff via an Apache
> >> > mirror?
> >> > The latest version is 2.7.1, by the way.
> >> >
> >> >
> >> > you should be grabbing the artifacts off the ASF and then verifying
> >> > their
> >> > SHA1 checksums as published on the ASF HTTPS web site
> >> >
> >> >
> >> > The problem with the Apache mirrors, if I am not mistaken, is that you
> >> > cannot use a single URL that automatically redirects you to a working
> >> > mirror
> >> > to download Hadoop. You have to pick a specific mirror and pray it
> >> > doesn't
> >> > disappear tomorrow.
> >> >
> >> >
> >> > They don't go away, especially http://mirror.ox.ac.uk , and in the us
> >> > the
> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
> kept.
> >> >
> >> > full list with availability stats
> >> >
> >> > http://www.apache.org/mirrors/
> >> >
> >> >
>


Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
I think the lua one at
https://svn.apache.org/repos/asf/infrastructure/site/trunk/content/dyn/closer.lua
has replaced the cgi one from before. Also it looks like the lua one
also supports `action=download` with a filename argument. So you could
just do something like

wget 
http://www.apache.org/dyn/closer.lua?filename=hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz=download

Thanks
Shivaram

On Sun, Nov 1, 2015 at 3:18 PM, Nicholas Chammas
 wrote:
> Oh, sweet! For example:
>
> http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz?asjson=1
>
> Thanks for sharing that tip. Looks like you can also use as_json (vs.
> asjson).
>
> Nick
>
>
> On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman
>  wrote:
>>
>> On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
>>  wrote:
>> > OK, I’ll focus on the Apache mirrors going forward.
>> >
>> > The problem with the Apache mirrors, if I am not mistaken, is that you
>> > cannot use a single URL that automatically redirects you to a working
>> > mirror
>> > to download Hadoop. You have to pick a specific mirror and pray it
>> > doesn’t
>> > disappear tomorrow.
>> >
>> > They don’t go away, especially http://mirror.ox.ac.uk , and in the us
>> > the
>> > apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
>> >
>> > So does Apache offer no way to query a URL and automatically get the
>> > closest
>> > working mirror? If I’m installing HDFS onto servers in various EC2
>> > regions,
>> > the best mirror will vary depending on my location.
>> >
>> Not sure if this is officially documented somewhere but if you pass
>> '=1' you will get back a JSON which has a 'preferred' field set
>> to the closest mirror.
>>
>> Shivaram
>> > Nick
>> >
>> >
>> > On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
>> >  wrote:
>> >>
>> >> I think that getting them from the ASF mirrors is a better strategy in
>> >> general as it'll remove the overhead of keeping the S3 bucket up to
>> >> date. It works in the spark-ec2 case because we only support a limited
>> >> number of Hadoop versions from the tool. FWIW I don't have write
>> >> access to the bucket and also haven't heard of any plans to support
>> >> newer versions in spark-ec2.
>> >>
>> >> Thanks
>> >> Shivaram
>> >>
>> >> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran 
>> >> wrote:
>> >> >
>> >> > On 1 Nov 2015, at 03:17, Nicholas Chammas
>> >> > 
>> >> > wrote:
>> >> >
>> >> > https://s3.amazonaws.com/spark-related-packages/
>> >> >
>> >> > spark-ec2 uses this bucket to download and install HDFS on clusters.
>> >> > Is
>> >> > it
>> >> > owned by the Spark project or by the AMPLab?
>> >> >
>> >> > Anyway, it looks like the latest Hadoop install available on there is
>> >> > Hadoop
>> >> > 2.4.0.
>> >> >
>> >> > Are there plans to add newer versions of Hadoop for use by spark-ec2
>> >> > and
>> >> > similar tools, or should we just be getting that stuff via an Apache
>> >> > mirror?
>> >> > The latest version is 2.7.1, by the way.
>> >> >
>> >> >
>> >> > you should be grabbing the artifacts off the ASF and then verifying
>> >> > their
>> >> > SHA1 checksums as published on the ASF HTTPS web site
>> >> >
>> >> >
>> >> > The problem with the Apache mirrors, if I am not mistaken, is that
>> >> > you
>> >> > cannot use a single URL that automatically redirects you to a working
>> >> > mirror
>> >> > to download Hadoop. You have to pick a specific mirror and pray it
>> >> > doesn't
>> >> > disappear tomorrow.
>> >> >
>> >> >
>> >> > They don't go away, especially http://mirror.ox.ac.uk , and in the us
>> >> > the
>> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
>> >> > kept.
>> >> >
>> >> > full list with availability stats
>> >> >
>> >> > http://www.apache.org/mirrors/
>> >> >
>> >> >

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Nicholas Chammas
Hmm, yeah, some Googling confirms this, though there isn't any clear
documentation about this.

Strangely, if I click on the link from your email the download works, but
curl and wget somehow don't get redirected correctly...

Nick

On Sun, Nov 1, 2015 at 6:40 PM Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> I think the lua one at
>
> https://svn.apache.org/repos/asf/infrastructure/site/trunk/content/dyn/closer.lua
> has replaced the cgi one from before. Also it looks like the lua one
> also supports `action=download` with a filename argument. So you could
> just do something like
>
> wget
> http://www.apache.org/dyn/closer.lua?filename=hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz=download
>
> Thanks
> Shivaram
>
> On Sun, Nov 1, 2015 at 3:18 PM, Nicholas Chammas
>  wrote:
> > Oh, sweet! For example:
> >
> >
> http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz?asjson=1
> >
> > Thanks for sharing that tip. Looks like you can also use as_json (vs.
> > asjson).
> >
> > Nick
> >
> >
> > On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman
> >  wrote:
> >>
> >> On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
> >>  wrote:
> >> > OK, I’ll focus on the Apache mirrors going forward.
> >> >
> >> > The problem with the Apache mirrors, if I am not mistaken, is that you
> >> > cannot use a single URL that automatically redirects you to a working
> >> > mirror
> >> > to download Hadoop. You have to pick a specific mirror and pray it
> >> > doesn’t
> >> > disappear tomorrow.
> >> >
> >> > They don’t go away, especially http://mirror.ox.ac.uk , and in the us
> >> > the
> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
> kept.
> >> >
> >> > So does Apache offer no way to query a URL and automatically get the
> >> > closest
> >> > working mirror? If I’m installing HDFS onto servers in various EC2
> >> > regions,
> >> > the best mirror will vary depending on my location.
> >> >
> >> Not sure if this is officially documented somewhere but if you pass
> >> '=1' you will get back a JSON which has a 'preferred' field set
> >> to the closest mirror.
> >>
> >> Shivaram
> >> > Nick
> >> >
> >> >
> >> > On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
> >> >  wrote:
> >> >>
> >> >> I think that getting them from the ASF mirrors is a better strategy
> in
> >> >> general as it'll remove the overhead of keeping the S3 bucket up to
> >> >> date. It works in the spark-ec2 case because we only support a
> limited
> >> >> number of Hadoop versions from the tool. FWIW I don't have write
> >> >> access to the bucket and also haven't heard of any plans to support
> >> >> newer versions in spark-ec2.
> >> >>
> >> >> Thanks
> >> >> Shivaram
> >> >>
> >> >> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <
> ste...@hortonworks.com>
> >> >> wrote:
> >> >> >
> >> >> > On 1 Nov 2015, at 03:17, Nicholas Chammas
> >> >> > 
> >> >> > wrote:
> >> >> >
> >> >> > https://s3.amazonaws.com/spark-related-packages/
> >> >> >
> >> >> > spark-ec2 uses this bucket to download and install HDFS on
> clusters.
> >> >> > Is
> >> >> > it
> >> >> > owned by the Spark project or by the AMPLab?
> >> >> >
> >> >> > Anyway, it looks like the latest Hadoop install available on there
> is
> >> >> > Hadoop
> >> >> > 2.4.0.
> >> >> >
> >> >> > Are there plans to add newer versions of Hadoop for use by
> spark-ec2
> >> >> > and
> >> >> > similar tools, or should we just be getting that stuff via an
> Apache
> >> >> > mirror?
> >> >> > The latest version is 2.7.1, by the way.
> >> >> >
> >> >> >
> >> >> > you should be grabbing the artifacts off the ASF and then verifying
> >> >> > their
> >> >> > SHA1 checksums as published on the ASF HTTPS web site
> >> >> >
> >> >> >
> >> >> > The problem with the Apache mirrors, if I am not mistaken, is that
> >> >> > you
> >> >> > cannot use a single URL that automatically redirects you to a
> working
> >> >> > mirror
> >> >> > to download Hadoop. You have to pick a specific mirror and pray it
> >> >> > doesn't
> >> >> > disappear tomorrow.
> >> >> >
> >> >> >
> >> >> > They don't go away, especially http://mirror.ox.ac.uk , and in
> the us
> >> >> > the
> >> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
> >> >> > kept.
> >> >> >
> >> >> > full list with availability stats
> >> >> >
> >> >> > http://www.apache.org/mirrors/
> >> >> >
> >> >> >
>


Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Nicholas Chammas
OK, I’ll focus on the Apache mirrors going forward.

The problem with the Apache mirrors, if I am not mistaken, is that you
cannot use a single URL that automatically redirects you to a working
mirror to download Hadoop. You have to pick a specific mirror and pray it
doesn’t disappear tomorrow.

They don’t go away, especially http://mirror.ox.ac.uk , and in the us the
apache.osuosl.org, osu being a where a lot of the ASF servers are kept.

So does Apache offer no way to query a URL and automatically get the
closest working mirror? If I’m installing HDFS onto servers in various EC2
regions, the best mirror will vary depending on my location.

Nick
​

On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> I think that getting them from the ASF mirrors is a better strategy in
> general as it'll remove the overhead of keeping the S3 bucket up to
> date. It works in the spark-ec2 case because we only support a limited
> number of Hadoop versions from the tool. FWIW I don't have write
> access to the bucket and also haven't heard of any plans to support
> newer versions in spark-ec2.
>
> Thanks
> Shivaram
>
> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran 
> wrote:
> >
> > On 1 Nov 2015, at 03:17, Nicholas Chammas 
> > wrote:
> >
> > https://s3.amazonaws.com/spark-related-packages/
> >
> > spark-ec2 uses this bucket to download and install HDFS on clusters. Is
> it
> > owned by the Spark project or by the AMPLab?
> >
> > Anyway, it looks like the latest Hadoop install available on there is
> Hadoop
> > 2.4.0.
> >
> > Are there plans to add newer versions of Hadoop for use by spark-ec2 and
> > similar tools, or should we just be getting that stuff via an Apache
> mirror?
> > The latest version is 2.7.1, by the way.
> >
> >
> > you should be grabbing the artifacts off the ASF and then verifying their
> > SHA1 checksums as published on the ASF HTTPS web site
> >
> >
> > The problem with the Apache mirrors, if I am not mistaken, is that you
> > cannot use a single URL that automatically redirects you to a working
> mirror
> > to download Hadoop. You have to pick a specific mirror and pray it
> doesn't
> > disappear tomorrow.
> >
> >
> > They don't go away, especially http://mirror.ox.ac.uk , and in the us
> the
> > apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
> >
> > full list with availability stats
> >
> > http://www.apache.org/mirrors/
> >
> >
>


Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
 wrote:
> OK, I’ll focus on the Apache mirrors going forward.
>
> The problem with the Apache mirrors, if I am not mistaken, is that you
> cannot use a single URL that automatically redirects you to a working mirror
> to download Hadoop. You have to pick a specific mirror and pray it doesn’t
> disappear tomorrow.
>
> They don’t go away, especially http://mirror.ox.ac.uk , and in the us the
> apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
>
> So does Apache offer no way to query a URL and automatically get the closest
> working mirror? If I’m installing HDFS onto servers in various EC2 regions,
> the best mirror will vary depending on my location.
>
Not sure if this is officially documented somewhere but if you pass
'=1' you will get back a JSON which has a 'preferred' field set
to the closest mirror.

Shivaram
> Nick
>
>
> On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
>  wrote:
>>
>> I think that getting them from the ASF mirrors is a better strategy in
>> general as it'll remove the overhead of keeping the S3 bucket up to
>> date. It works in the spark-ec2 case because we only support a limited
>> number of Hadoop versions from the tool. FWIW I don't have write
>> access to the bucket and also haven't heard of any plans to support
>> newer versions in spark-ec2.
>>
>> Thanks
>> Shivaram
>>
>> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran 
>> wrote:
>> >
>> > On 1 Nov 2015, at 03:17, Nicholas Chammas 
>> > wrote:
>> >
>> > https://s3.amazonaws.com/spark-related-packages/
>> >
>> > spark-ec2 uses this bucket to download and install HDFS on clusters. Is
>> > it
>> > owned by the Spark project or by the AMPLab?
>> >
>> > Anyway, it looks like the latest Hadoop install available on there is
>> > Hadoop
>> > 2.4.0.
>> >
>> > Are there plans to add newer versions of Hadoop for use by spark-ec2 and
>> > similar tools, or should we just be getting that stuff via an Apache
>> > mirror?
>> > The latest version is 2.7.1, by the way.
>> >
>> >
>> > you should be grabbing the artifacts off the ASF and then verifying
>> > their
>> > SHA1 checksums as published on the ASF HTTPS web site
>> >
>> >
>> > The problem with the Apache mirrors, if I am not mistaken, is that you
>> > cannot use a single URL that automatically redirects you to a working
>> > mirror
>> > to download Hadoop. You have to pick a specific mirror and pray it
>> > doesn't
>> > disappear tomorrow.
>> >
>> >
>> > They don't go away, especially http://mirror.ox.ac.uk , and in the us
>> > the
>> > apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
>> >
>> > full list with availability stats
>> >
>> > http://www.apache.org/mirrors/
>> >
>> >

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Unable to run applications on spark in standalone cluster mode

2015-11-01 Thread Akhil Das
Can you paste the contents of your spark-env.sh file? Also would be good to
have a look at the /etc/hosts file. Cannot bind to the given ip address can
be resolved if you put the hostname instead of the ip address. Also make
sure the configuration (conf directory) across your cluster have the same
contents.

Thanks
Best Regards

On Mon, Oct 26, 2015 at 10:48 AM, Rohith P 
wrote:

> No.. the ./sbin/start-master.sh --ip option did not work... It is still the
> same error
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Unable-to-run-applications-on-spark-in-standalone-cluster-mode-tp14683p14779.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Implementation of RNN/LSTM in Spark

2015-11-01 Thread Sasaki Kai
Hi, Disha

There seems to be no JIRA on RNN/LSTM directly. But there were several tickets 
about other type of networks regarding deep learning.

Stacked Auto Encoder
https://issues.apache.org/jira/browse/SPARK-2623 

CNN
https://issues.apache.org/jira/browse/SPARK-9129 

https://issues.apache.org/jira/browse/SPARK-9273 


Roadmap of MLlib deep learning
https://issues.apache.org/jira/browse/SPARK-5575 


I think it may be good to join the discussion on SPARK-5575. 
Best

Kai Sasaki


> On Nov 2, 2015, at 1:59 PM, Disha Shrivastava  wrote:
> 
> Hi,
> 
> I wanted to know if someone is working on implementing RNN/LSTM in Spark or 
> has already done. I am also willing to contribute to it and get some guidance 
> on how to go about it.
> 
> Thanks and Regards
> Disha
> Masters Student, IIT Delhi



Implementation of RNN/LSTM in Spark

2015-11-01 Thread Disha Shrivastava
Hi,

I wanted to know if someone is working on implementing RNN/LSTM in Spark or
has already done. I am also willing to contribute to it and get some
guidance on how to go about it.

Thanks and Regards
Disha
Masters Student, IIT Delhi