I want to contribute MLlib two quality measures(ARHR and HR) for top N recommendation system. Is this meaningful?

2014-08-25 Thread Lizhengbing (bing, BIPA)
Hi:
In paper Item-Based Top-N Recommendation 
Algorithms(https://stuyresearch.googlecode.com/hg/blake/resources/10.1.1.102.4451.pdf),
 there are two parameters measuring the quality of recommendation: HR and ARHR.
If I use ALS(Implicit) for top-N recommendation system, I want to check it's 
quality. ARHR and HR are two good quality measures.
I want to contribute them to spark MLlib.  So I want to know whether this is 
meaningful?


(1) If n is the total number of customers/users,  the hit-rate of the 
recommendation algorithm was computed as
hit-rate (HR) = Number of hits / n

(2)If h is the number of hits that occurred at positions p1, p2, . . . , ph 
within the top-N lists (i.e., 1 ≤ pi ≤ N), then the average reciprocal hit-rank 
is equal to:
[cid:image001.png@01CFC086.8EE1FF40]i
.


Re: Mesos/Spark Deadlock

2014-08-25 Thread Gary Malouf
We have not tried the work-around because there are other bugs in there
that affected our set-up, though it seems it would help.


On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:

 +1 to have the work around in.

 I'll be investigating from the Mesos side too.

 Tim

 On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too
 bad that this happens in fine-grained mode -- would be really good to fix.
 I'll see if we can get the workaround in
 https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally
 have you tried that?
 
  Matei
 
  On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  Hi Matei,
 
  We have an analytics team that uses the cluster on a daily basis.  They
 use two types of 'run modes':
 
  1) For running actual queries, they set the spark.executor.memory to
 something between 4 and 8GB of RAM/worker.
 
  2) A shell that takes a minimal amount of memory on workers (128MB) for
 prototyping out a larger query.  This allows them to not take up RAM on the
 cluster when they do not really need it.
 
  We see the deadlocks when there are a few shells in either case.  From
 the usage patterns we have, coarse-grained mode would be a challenge as we
 have to constantly remind people to kill their shells as soon as their
 queries finish.
 
  Am I correct in viewing Mesos in coarse-grained mode as being similar to
 Spark Standalone's cpu allocation behavior?
 
 
 
 
  On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Hey Gary, just as a workaround, note that you can use Mesos in
 coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold
 onto CPUs for the duration of the job.
 
  Matei
 
  On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  I just wanted to bring up a significant Mesos/Spark issue that makes the
  combo difficult to use for teams larger than 4-5 people. It's covered in
  https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
 that
  Spark's use of executors in fine-grained mode is a very different
 behavior
  than many of the other common frameworks for Mesos.
 



Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-25 Thread Nicholas Chammas
FYI: Looks like the Mesos folk also have a bot to do automatic linking, but
it appears to have been provided to them somehow by ASF.

See this comment as an example:
https://issues.apache.org/jira/browse/MESOS-1688?focusedCommentId=14109078page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14109078

Might be a small win to push this work to a bot ASF manages if we can get
access to it (and if we have no concerns about depending on an another
external service).

Nick


On Mon, Aug 11, 2014 at 4:10 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 Thanks for looking into this. I think little tools like this are super
 helpful.

 Would it hurt to open a request with INFRA to install/configure the
 JIRA-GitHub plugin while we continue to use the Python script we have? I
 wouldn't mind opening that JIRA issue with them.

 Nick


 On Mon, Aug 11, 2014 at 12:52 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 I spent some time on this and I'm not sure either of these is an option,
 unfortunately.

 We typically can't use custom JIRA plug-in's because this JIRA is
 controlled by the ASF and we don't have rights to modify most things about
 how it works (it's a large shared JIRA instance used by more than 50
 projects). It's worth looking into whether they can do something. In
 general we've tended to avoid going through ASF infra them whenever
 possible, since they are generally overloaded and things move very slowly,
 even if there are outages.

 Here is the script we use to do the sync:
 https://github.com/apache/spark/blob/master/dev/github_jira_sync.py

 It might be possible to modify this to support post-hoc changes, but we'd
 need to think about how to do so while minimizing function calls to the ASF
 JIRA API, which I found are very slow.

 - Patrick



 On Mon, Aug 11, 2014 at 7:51 AM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 It looks like this script doesn't catch PRs that are opened and *then*
 have

 the JIRA issue ID added to the name. Would it be easy to somehow have the
 script trigger on PR name changes as well as PR creates?

 Alternately, is there a reason we can't or don't want to use the plugin
 mentioned below? (I'm assuming it covers cases like this, but I'm not
 sure.)

 Nick



 On Wed, Jul 23, 2014 at 12:52 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

  By the way, it looks like there’s a JIRA plugin that integrates it with
  GitHub:
 
 -
 
 https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin

 -
 
 https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA
 
  It does the automatic linking and shows some additional information
  
 https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png
 

  that might be nice to have for heavy JIRA users.
 
  Nick
 
 
 
  On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Yeah it needs to have SPARK-XXX in the title (this is the format we
  request already). It just works with small synchronization script I
  wrote that we run every five minutes on Jeknins that uses the Github
  and Jenkins API:
 
 
 
 https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929
 
  - Patrick
 
  On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
   That's pretty neat.
  
   How does it work? Do we just need to put the issue ID (e.g.
 SPARK-1234)
   anywhere in the pull request?
  
   Nick
  
  
   On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
  
   Just a small note, today I committed a tool that will automatically
   mirror pull requests to JIRA issues, so contributors will no longer
   have to manually post a pull request on the JIRA when they make
 one.
  
   It will create a link on the JIRA and also make a comment to
 trigger
   an e-mail to people watching.
  
   This should make some things easier, such as avoiding accidental
   duplicate effort on the same JIRA.
  
   - Patrick
  
 
 
 






Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-25 Thread npanj
I am running the code with @rxin's patch in standalone mode.  In my case I am
registering org.apache.spark.graphx.GraphKryoRegistrator . 

Recently I started to see com.esotericsoftware.kryo.KryoException:
java.io.IOException: failed to uncompress the chunk: PARSING_ERROR . Has
anyone seen this? Could it be related to this issue?  Here it trace: 
--
vids (org.apache.spark.graphx.impl.VertexAttributeBlock)
com.esotericsoftware.kryo.io.Input.fill(Input.java:142)
com.esotericsoftware.kryo.io.Input.require(Input.java:169)
com.esotericsoftware.kryo.io.Input.readLong_slow(Input.java:710)
com.esotericsoftware.kryo.io.Input.readLong(Input.java:665)
   
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$LongArraySerializer.read(DefaultArraySerializers.java:127)
   
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$LongArraySerializer.read(DefaultArraySerializers.java:107)
com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
   
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
   
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
   
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
   
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
   
org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1054)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
   
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
   
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   
org.apache.spark.graphx.impl.VertexPartitionBaseOps.innerJoinKeepLeft(VertexPartitionBaseOps.scala:192)
   
org.apache.spark.graphx.impl.EdgePartition.updateVertices(EdgePartition.scala:78)
   
org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:75)
   
org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:73)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   
org.apache.spark.graphx.EdgeRDD$$anonfun$mapEdgePartitions$1.apply(EdgeRDD.scala:87)
   
org.apache.spark.graphx.EdgeRDD$$anonfun$mapEdgePartitions$1.apply(EdgeRDD.scala:85)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
   
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
   
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
   
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
   
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:202)
   
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

--




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-2878-Kryo-serialisation-with-custom-Kryo-registrator-failing-tp7719p7989.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread amnonkhen
Hi jerryye,
Maybe if you voted up my question on Stack Overflow it would get some
traction and we would get nearer to a solution.
Thanks,
  Amnon



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p7991.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
This is kind of weird then, seems perhaps unrelated to this issue (or at least 
to the way I understood it). Is the problem maybe that Mesos saw 0 MB being 
freed and didn't re-offer the machine *even though there was more than 32 MB 
free overall*?

Matei

On August 25, 2014 at 12:59:59 PM, Cody Koeninger (c...@koeninger.org) wrote:

I definitely saw a case where

a. the only job running was a 256m shell
b. I started a 2g job
c. a little while later the same user as in a started another 256m shell

My job immediately stopped making progress.  Once user a killed his shells, it 
started again.

This is on nodes with ~15G of memory, on which we have successfully run 8G jobs.


On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
BTW it seems to me that even without that patch, you should be getting tasks 
launched as long as you leave at least 32 MB of memory free on each machine 
(that is, the sum of the executor memory sizes is not exactly the same as the 
total size of the machine). Then Mesos will be able to re-offer that machine 
whenever CPUs free up.

Matei

On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com) wrote:

We have not tried the work-around because there are other bugs in there
that affected our set-up, though it seems it would help.


On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:

 +1 to have the work around in.

 I'll be investigating from the Mesos side too.

 Tim

 On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too
 bad that this happens in fine-grained mode -- would be really good to fix.
 I'll see if we can get the workaround in
 https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally
 have you tried that?
 
  Matei
 
  On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  Hi Matei,
 
  We have an analytics team that uses the cluster on a daily basis. They
 use two types of 'run modes':
 
  1) For running actual queries, they set the spark.executor.memory to
 something between 4 and 8GB of RAM/worker.
 
  2) A shell that takes a minimal amount of memory on workers (128MB) for
 prototyping out a larger query. This allows them to not take up RAM on the
 cluster when they do not really need it.
 
  We see the deadlocks when there are a few shells in either case. From
 the usage patterns we have, coarse-grained mode would be a challenge as we
 have to constantly remind people to kill their shells as soon as their
 queries finish.
 
  Am I correct in viewing Mesos in coarse-grained mode as being similar to
 Spark Standalone's cpu allocation behavior?
 
 
 
 
  On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Hey Gary, just as a workaround, note that you can use Mesos in
 coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold
 onto CPUs for the duration of the job.
 
  Matei
 
  On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  I just wanted to bring up a significant Mesos/Spark issue that makes the
  combo difficult to use for teams larger than 4-5 people. It's covered in
  https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
 that
  Spark's use of executors in fine-grained mode is a very different
 behavior
  than many of the other common frameworks for Mesos.
 




Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
Anyway it would be good if someone from the Mesos side investigates this and 
proposes a solution. The 32 MB per task hack isn't completely foolproof either 
(e.g. people might allocate all the RAM to their executor and thus stop being 
able to launch tasks), so maybe we wait on a Mesos fix for this one.

Matei

On August 25, 2014 at 1:07:15 PM, Matei Zaharia (matei.zaha...@gmail.com) wrote:

This is kind of weird then, seems perhaps unrelated to this issue (or at least 
to the way I understood it). Is the problem maybe that Mesos saw 0 MB being 
freed and didn't re-offer the machine *even though there was more than 32 MB 
free overall*?

Matei

On August 25, 2014 at 12:59:59 PM, Cody Koeninger (c...@koeninger.org) wrote:

I definitely saw a case where

a. the only job running was a 256m shell
b. I started a 2g job
c. a little while later the same user as in a started another 256m shell

My job immediately stopped making progress.  Once user a killed his shells, it 
started again.

This is on nodes with ~15G of memory, on which we have successfully run 8G jobs.


On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
BTW it seems to me that even without that patch, you should be getting tasks 
launched as long as you leave at least 32 MB of memory free on each machine 
(that is, the sum of the executor memory sizes is not exactly the same as the 
total size of the machine). Then Mesos will be able to re-offer that machine 
whenever CPUs free up.

Matei

On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com) wrote:

We have not tried the work-around because there are other bugs in there
that affected our set-up, though it seems it would help.


On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:

 +1 to have the work around in.

 I'll be investigating from the Mesos side too.

 Tim

 On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too
 bad that this happens in fine-grained mode -- would be really good to fix.
 I'll see if we can get the workaround in
 https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally
 have you tried that?
 
  Matei
 
  On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  Hi Matei,
 
  We have an analytics team that uses the cluster on a daily basis. They
 use two types of 'run modes':
 
  1) For running actual queries, they set the spark.executor.memory to
 something between 4 and 8GB of RAM/worker.
 
  2) A shell that takes a minimal amount of memory on workers (128MB) for
 prototyping out a larger query. This allows them to not take up RAM on the
 cluster when they do not really need it.
 
  We see the deadlocks when there are a few shells in either case. From
 the usage patterns we have, coarse-grained mode would be a challenge as we
 have to constantly remind people to kill their shells as soon as their
 queries finish.
 
  Am I correct in viewing Mesos in coarse-grained mode as being similar to
 Spark Standalone's cpu allocation behavior?
 
 
 
 
  On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Hey Gary, just as a workaround, note that you can use Mesos in
 coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold
 onto CPUs for the duration of the job.
 
  Matei
 
  On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  I just wanted to bring up a significant Mesos/Spark issue that makes the
  combo difficult to use for teams larger than 4-5 people. It's covered in
  https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
 that
  Spark's use of executors in fine-grained mode is a very different
 behavior
  than many of the other common frameworks for Mesos.
 




Re: Working Formula for Hive 0.13?

2014-08-25 Thread Michael Armbrust
Thanks for working on this!  Its unclear at the moment exactly how we are
going to handle this, since the end goal is to be compatible with as many
versions of Hive as possible.  That said, I think it would be great to open
a PR in this case.  Even if we don't merge it, thats a good way to get it
on people's radar and have a discussion about the changes that are required.


On Sun, Aug 24, 2014 at 7:11 PM, scwf wangf...@huawei.com wrote:

   I have worked for a branch update the hive version to hive-0.13(by
 org.apache.hive)---https://github.com/scwf/spark/tree/hive-0.13
 I am wondering whether it's ok to make a PR now because hive-0.13 version
 is not compatible with hive-0.12 and here i used org.apache.hive.



 On 2014/7/29 8:22, Michael Armbrust wrote:

 A few things:
   - When we upgrade to Hive 0.13.0, Patrick will likely republish the
 hive-exec jar just as we did for 0.12.0
   - Since we have to tie into some pretty low level APIs it is
 unsurprising
 that the code doesn't just compile out of the box against 0.13.0
   - ScalaReflection is for determining Schema from Scala classes, not
 reflection based bridge code.  Either way its unclear to if there is any
 reason to use reflection to support multiple versions, instead of just
 upgrading to Hive 0.13.0

 One question I have is, What is the goal of upgrading to hive 0.13.0?  Is
 it purely because you are having problems connecting to newer metastores?
   Are there some features you are hoping for?  This will help me
 prioritize
 this effort.

 Michael


 On Mon, Jul 28, 2014 at 4:05 PM, Ted Yu yuzhih...@gmail.com wrote:

  I was looking for a class where reflection-related code should reside.

 I found this but don't think it is the proper class for bridging
 differences between hive 0.12 and 0.13.1:

 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/
 ScalaReflection.scala

 Cheers


 On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote:

  After manually copying hive 0.13.1 jars to local maven repo, I got the
 following errors when building spark-hive_2.10 module :

 [ERROR]

  /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
 sql/hive/HiveContext.scala:182:

 type mismatch;
   found   : String
   required: Array[String]
 [ERROR]   val proc: CommandProcessor =
 CommandProcessorFactory.get(tokens(0), hiveconf)
 [ERROR]
 ^
 [ERROR]

  /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
 sql/hive/HiveMetastoreCatalog.scala:60:

 value getAllPartitionsForPruner is not a member of org.apache.
   hadoop.hive.ql.metadata.Hive
 [ERROR] client.getAllPartitionsForPruner(table).toSeq
 [ERROR]^
 [ERROR]

  /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
 sql/hive/HiveMetastoreCatalog.scala:267:

 overloaded method constructor TableDesc with alternatives:
(x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2:
 Class[_],x$3:

 java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc

 and
()org.apache.hadoop.hive.ql.plan.TableDesc
   cannot be applied to (Class[org.apache.hadoop.hive.
 serde2.Deserializer],
 Class[(some other)?0(in value tableDesc)(in value tableDesc)],

 Class[?0(in

 value tableDesc)(in   value tableDesc)], java.util.Properties)
 [ERROR]   val tableDesc = new TableDesc(
 [ERROR]   ^
 [WARNING] Class org.antlr.runtime.tree.CommonTree not found -
 continuing
 with a stub.
 [WARNING] Class org.antlr.runtime.Token not found - continuing with a

 stub.

 [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with
 a
 stub.
 [ERROR]
   while compiling:

  /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
 sql/hive/HiveQl.scala

  during phase: typer
   library version: version 2.10.4
  compiler version: version 2.10.4

 The above shows incompatible changes between 0.12 and 0.13.1
 e.g. the first error corresponds to the following method
 in CommandProcessorFactory :
public static CommandProcessor get(String[] cmd, HiveConf conf)

 Cheers


 On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com
 wrote:

  So, do we have a short-term fix until Hive 0.14 comes out? Perhaps

 adding

 the hive-exec jar to the spark-project repo? It doesn¹t look like

 there¹s

 a release date schedule for 0.14.



 On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote:

  Exactly, forgot to mention Hulu team also made changes to cope with

 those

 incompatibility issues, but they said that¹s relatively easy once the
 re-packaging work is done.


 On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com


  wrote:

  I've heard from Cloudera that there were hive internal changes

 between

  0.12 and 0.13 that required code re-writing. Over time it might be
 possible for us to integrate with hive using API's that are more
 stable (this is the domain of Michael/Cheng/Yin more than me!). It
 would be interesting to see what the Hulu folks did.

 - Patrick

 On Mon, Jul 28, 2014 at 

Re: Storage Handlers in Spark SQL

2014-08-25 Thread Michael Armbrust
- dev list
+ user list

You should be able to query Spark SQL using JDBC, starting with the 1.1
release.  There is some documentation is the repo
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md#running-the-thrift-jdbc-server,
and we'll update the official docs once the release is out.


On Thu, Aug 21, 2014 at 4:43 AM, Niranda Perera nira...@wso2.com wrote:

 Hi,

 I have been playing around with Spark for the past few days, and evaluating
 the possibility of migrating into Spark (Spark SQL) from Hive/Hadoop.

 I am working on the WSO2 Business Activity Monitor (WSO2 BAM,

 https://docs.wso2.com/display/BAM241/WSO2+Business+Activity+Monitor+Documentation
 ) which has currently employed Hive. We are considering Spark as a
 successor for Hive, given it's performance enhancement.

 We have currently employed several custom storage-handlers in Hive.
 Example:
 WSO2 JDBC and Cassandra storage handlers:
 https://docs.wso2.com/display/BAM241/JDBC+Storage+Handler+for+Hive

 https://docs.wso2.com/display/BAM241/Creating+Hive+Queries+to+Analyze+Data#CreatingHiveQueriestoAnalyzeData-cas

 I would like to know where Spark SQL can work with these storage
 handlers (while using HiveContext may be) ?

 Best regards
 --
 *Niranda Perera*
 Software Engineer, WSO2 Inc.
 Mobile: +94-71-554-8430
 Twitter: @n1r44 https://twitter.com/N1R44



Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Matei Zaharia
Was the original issue with Spark 1.1 (i.e. master branch) or an earlier 
release?

One possibility is that your S3 bucket is in a remote Amazon region, which 
would make it very slow. In my experience though saveAsTextFile has worked even 
for pretty large datasets in that situation, so maybe there's something else in 
your job causing a problem. Have you tried other operations on the data, like 
count(), or saving synthetic datasets (e.g. sc.parallelize(1 to 100*1000*1000, 
20).saveAsTextFile(...)?

Matei

On August 25, 2014 at 12:09:25 PM, amnonkhen (amnon...@gmail.com) wrote:

Hi jerryye, 
Maybe if you voted up my question on Stack Overflow it would get some 
traction and we would get nearer to a solution. 
Thanks, 
Amnon 



-- 
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p7991.html
 
Sent from the Apache Spark Developers List mailing list archive at Nabble.com. 

- 
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
For additional commands, e-mail: dev-h...@spark.apache.org 



Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Patrick Wendell
One other idea - when things freeze up, try to run jstack on the spark
shell process and on the executors and attach the results. It could be that
somehow you are encountering a deadlock somewhere.


On Mon, Aug 25, 2014 at 1:26 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 Was the original issue with Spark 1.1 (i.e. master branch) or an earlier
 release?

 One possibility is that your S3 bucket is in a remote Amazon region, which
 would make it very slow. In my experience though saveAsTextFile has worked
 even for pretty large datasets in that situation, so maybe there's
 something else in your job causing a problem. Have you tried other
 operations on the data, like count(), or saving synthetic datasets (e.g.
 sc.parallelize(1 to 100*1000*1000, 20).saveAsTextFile(...)?

 Matei

 On August 25, 2014 at 12:09:25 PM, amnonkhen (amnon...@gmail.com) wrote:

 Hi jerryye,
 Maybe if you voted up my question on Stack Overflow it would get some
 traction and we would get nearer to a solution.
 Thanks,
 Amnon



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p7991.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-25 Thread Patrick Wendell
Hey Nicholas,

That seems promising - I prefer having a proper link to having that fairly
verbose comment though, because in some cases there will be dozens of
comments and it could get lost. I wonder if they could do something where
it posts a link instead...

- Patrick


On Mon, Aug 25, 2014 at 11:06 AM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 FYI: Looks like the Mesos folk also have a bot to do automatic linking,
 but it appears to have been provided to them somehow by ASF.

 See this comment as an example:
 https://issues.apache.org/jira/browse/MESOS-1688?focusedCommentId=14109078page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14109078

 Might be a small win to push this work to a bot ASF manages if we can get
 access to it (and if we have no concerns about depending on an another
 external service).

 Nick


 On Mon, Aug 11, 2014 at 4:10 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 Thanks for looking into this. I think little tools like this are super
 helpful.

 Would it hurt to open a request with INFRA to install/configure the
 JIRA-GitHub plugin while we continue to use the Python script we have? I
 wouldn't mind opening that JIRA issue with them.

 Nick


 On Mon, Aug 11, 2014 at 12:52 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 I spent some time on this and I'm not sure either of these is an option,
 unfortunately.

 We typically can't use custom JIRA plug-in's because this JIRA is
 controlled by the ASF and we don't have rights to modify most things about
 how it works (it's a large shared JIRA instance used by more than 50
 projects). It's worth looking into whether they can do something. In
 general we've tended to avoid going through ASF infra them whenever
 possible, since they are generally overloaded and things move very slowly,
 even if there are outages.

 Here is the script we use to do the sync:
 https://github.com/apache/spark/blob/master/dev/github_jira_sync.py

 It might be possible to modify this to support post-hoc changes, but
 we'd need to think about how to do so while minimizing function calls to
 the ASF JIRA API, which I found are very slow.

 - Patrick



 On Mon, Aug 11, 2014 at 7:51 AM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 It looks like this script doesn't catch PRs that are opened and *then*
 have

 the JIRA issue ID added to the name. Would it be easy to somehow have
 the
 script trigger on PR name changes as well as PR creates?

 Alternately, is there a reason we can't or don't want to use the plugin
 mentioned below? (I'm assuming it covers cases like this, but I'm not
 sure.)

 Nick



 On Wed, Jul 23, 2014 at 12:52 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

  By the way, it looks like there's a JIRA plugin that integrates it
 with
  GitHub:
 
 -
 
 https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin

 -
 
 https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA
 
  It does the automatic linking and shows some additional information
  
 https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png
 

  that might be nice to have for heavy JIRA users.
 
  Nick
 
 
 
  On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com
 
  wrote:
 
  Yeah it needs to have SPARK-XXX in the title (this is the format we
  request already). It just works with small synchronization script I
  wrote that we run every five minutes on Jeknins that uses the Github
  and Jenkins API:
 
 
 
 https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929
 
  - Patrick
 
  On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
   That's pretty neat.
  
   How does it work? Do we just need to put the issue ID (e.g.
 SPARK-1234)
   anywhere in the pull request?
  
   Nick
  
  
   On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
  
   Just a small note, today I committed a tool that will
 automatically
   mirror pull requests to JIRA issues, so contributors will no
 longer
   have to manually post a pull request on the JIRA when they make
 one.
  
   It will create a link on the JIRA and also make a comment to
 trigger
   an e-mail to people watching.
  
   This should make some things easier, such as avoiding accidental
   duplicate effort on the same JIRA.
  
   - Patrick
  
 
 
 







Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread jerryye
Hi Matei,
At least in my case, the s3 bucket is in the same region. Running count()
works and so does generating synthetic data. What I saw was that the job
would hang for over an hour with no progress but tasks would immediately
start finishing if I cached the data.

- jerry


On Mon, Aug 25, 2014 at 1:26 PM, Matei Zaharia [via Apache Spark Developers
List] ml-node+s1001551n8000...@n3.nabble.com wrote:

 Was the original issue with Spark 1.1 (i.e. master branch) or an earlier
 release?

 One possibility is that your S3 bucket is in a remote Amazon region, which
 would make it very slow. In my experience though saveAsTextFile has worked
 even for pretty large datasets in that situation, so maybe there's
 something else in your job causing a problem. Have you tried other
 operations on the data, like count(), or saving synthetic datasets (e.g.
 sc.parallelize(1 to 100*1000*1000, 20).saveAsTextFile(...)?

 Matei

 On August 25, 2014 at 12:09:25 PM, amnonkhen ([hidden email]
 http://user/SendEmail.jtp?type=nodenode=8000i=0) wrote:

 Hi jerryye,
 Maybe if you voted up my question on Stack Overflow it would get some
 traction and we would get nearer to a solution.
 Thanks,
 Amnon



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p7991.html

 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=8000i=1
 For additional commands, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=8000i=2



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8000.html
  To start a new topic under Apache Spark Developers List, email
 ml-node+s1001551n1...@n3.nabble.com
 To unsubscribe from Apache Spark Developers List, click here
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=amVycnl5ZUBnbWFpbC5jb218MXwtNTI4OTc1MTAz
 .
 NAML
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8003.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Mesos/Spark Deadlock

2014-08-25 Thread Timothy Chen
Hi Matei,

I'm going to investigate from both Mesos and Spark side will hopefully
have a good long term solution. In the mean time having a work around
to start with is going to unblock folks.

Tim

On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 Anyway it would be good if someone from the Mesos side investigates this and
 proposes a solution. The 32 MB per task hack isn't completely foolproof
 either (e.g. people might allocate all the RAM to their executor and thus
 stop being able to launch tasks), so maybe we wait on a Mesos fix for this
 one.

 Matei

 On August 25, 2014 at 1:07:15 PM, Matei Zaharia (matei.zaha...@gmail.com)
 wrote:

 This is kind of weird then, seems perhaps unrelated to this issue (or at
 least to the way I understood it). Is the problem maybe that Mesos saw 0 MB
 being freed and didn't re-offer the machine *even though there was more than
 32 MB free overall*?

 Matei

 On August 25, 2014 at 12:59:59 PM, Cody Koeninger (c...@koeninger.org)
 wrote:

 I definitely saw a case where

 a. the only job running was a 256m shell
 b. I started a 2g job
 c. a little while later the same user as in a started another 256m shell

 My job immediately stopped making progress.  Once user a killed his shells,
 it started again.

 This is on nodes with ~15G of memory, on which we have successfully run 8G
 jobs.


 On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

 BTW it seems to me that even without that patch, you should be getting
 tasks launched as long as you leave at least 32 MB of memory free on each
 machine (that is, the sum of the executor memory sizes is not exactly the
 same as the total size of the machine). Then Mesos will be able to re-offer
 that machine whenever CPUs free up.

 Matei

 On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:

 We have not tried the work-around because there are other bugs in there
 that affected our set-up, though it seems it would help.


 On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:

  +1 to have the work around in.
 
  I'll be investigating from the Mesos side too.
 
  Tim
 
  On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
   Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's
   too
  bad that this happens in fine-grained mode -- would be really good to
  fix.
  I'll see if we can get the workaround in
  https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally
  have you tried that?
  
   Matei
  
   On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com)
  wrote:
  
   Hi Matei,
  
   We have an analytics team that uses the cluster on a daily basis. They
  use two types of 'run modes':
  
   1) For running actual queries, they set the spark.executor.memory to
  something between 4 and 8GB of RAM/worker.
  
   2) A shell that takes a minimal amount of memory on workers (128MB)
   for
  prototyping out a larger query. This allows them to not take up RAM on
  the
  cluster when they do not really need it.
  
   We see the deadlocks when there are a few shells in either case. From
  the usage patterns we have, coarse-grained mode would be a challenge as
  we
  have to constantly remind people to kill their shells as soon as their
  queries finish.
  
   Am I correct in viewing Mesos in coarse-grained mode as being similar
   to
  Spark Standalone's cpu allocation behavior?
  
  
  
  
   On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia
   matei.zaha...@gmail.com
  wrote:
   Hey Gary, just as a workaround, note that you can use Mesos in
  coarse-grained mode by setting spark.mesos.coarse=true. Then it will
  hold
  onto CPUs for the duration of the job.
  
   Matei
  
   On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
  wrote:
  
   I just wanted to bring up a significant Mesos/Spark issue that makes
   the
   combo difficult to use for teams larger than 4-5 people. It's covered
   in
   https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
  that
   Spark's use of executors in fine-grained mode is a very different
  behavior
   than many of the other common frameworks for Mesos.
  
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



RE: Working Formula for Hive 0.13?

2014-08-25 Thread Andrew Lee
From my perspective, there're few benefits regarding Hive 0.13.1+. The 
following are the 4 major ones that I can see why people are asking to upgrade 
to Hive 0.13.1 recently.
1. Performance and bug fix, patches. (Usual case)
2. Native support for Parquet format, no need to provide custom JARs and SerDe 
like Hive 0.12. (Depends, driven by data format and queries)
3. Support of Tez engine which gives performance improvement in several use 
cases (Performance improvement)
4. Security enhancement in Hive 0.13.1 has improved a lot (Security concerns, 
ACLs, etc)
These are the major benefits I see to upgrade to Hive 0.13.1+ from Hive 0.12.0.
There may be others out there that I'm not aware of, but I do see it coming.
my 2 cents.
 From: mich...@databricks.com
 Date: Mon, 25 Aug 2014 13:08:42 -0700
 Subject: Re: Working Formula for Hive 0.13?
 To: wangf...@huawei.com
 CC: dev@spark.apache.org
 
 Thanks for working on this!  Its unclear at the moment exactly how we are
 going to handle this, since the end goal is to be compatible with as many
 versions of Hive as possible.  That said, I think it would be great to open
 a PR in this case.  Even if we don't merge it, thats a good way to get it
 on people's radar and have a discussion about the changes that are required.
 
 
 On Sun, Aug 24, 2014 at 7:11 PM, scwf wangf...@huawei.com wrote:
 
I have worked for a branch update the hive version to hive-0.13(by
  org.apache.hive)---https://github.com/scwf/spark/tree/hive-0.13
  I am wondering whether it's ok to make a PR now because hive-0.13 version
  is not compatible with hive-0.12 and here i used org.apache.hive.
 
 
 
  On 2014/7/29 8:22, Michael Armbrust wrote:
 
  A few things:
- When we upgrade to Hive 0.13.0, Patrick will likely republish the
  hive-exec jar just as we did for 0.12.0
- Since we have to tie into some pretty low level APIs it is
  unsurprising
  that the code doesn't just compile out of the box against 0.13.0
- ScalaReflection is for determining Schema from Scala classes, not
  reflection based bridge code.  Either way its unclear to if there is any
  reason to use reflection to support multiple versions, instead of just
  upgrading to Hive 0.13.0
 
  One question I have is, What is the goal of upgrading to hive 0.13.0?  Is
  it purely because you are having problems connecting to newer metastores?
Are there some features you are hoping for?  This will help me
  prioritize
  this effort.
 
  Michael
 
 
  On Mon, Jul 28, 2014 at 4:05 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   I was looking for a class where reflection-related code should reside.
 
  I found this but don't think it is the proper class for bridging
  differences between hive 0.12 and 0.13.1:
 
  sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/
  ScalaReflection.scala
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   After manually copying hive 0.13.1 jars to local maven repo, I got the
  following errors when building spark-hive_2.10 module :
 
  [ERROR]
 
   /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
  sql/hive/HiveContext.scala:182:
 
  type mismatch;
found   : String
required: Array[String]
  [ERROR]   val proc: CommandProcessor =
  CommandProcessorFactory.get(tokens(0), hiveconf)
  [ERROR]
  ^
  [ERROR]
 
   /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
  sql/hive/HiveMetastoreCatalog.scala:60:
 
  value getAllPartitionsForPruner is not a member of org.apache.
hadoop.hive.ql.metadata.Hive
  [ERROR] client.getAllPartitionsForPruner(table).toSeq
  [ERROR]^
  [ERROR]
 
   /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
  sql/hive/HiveMetastoreCatalog.scala:267:
 
  overloaded method constructor TableDesc with alternatives:
 (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2:
  Class[_],x$3:
 
  java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc
 
  and
 ()org.apache.hadoop.hive.ql.plan.TableDesc
cannot be applied to (Class[org.apache.hadoop.hive.
  serde2.Deserializer],
  Class[(some other)?0(in value tableDesc)(in value tableDesc)],
 
  Class[?0(in
 
  value tableDesc)(in   value tableDesc)], java.util.Properties)
  [ERROR]   val tableDesc = new TableDesc(
  [ERROR]   ^
  [WARNING] Class org.antlr.runtime.tree.CommonTree not found -
  continuing
  with a stub.
  [WARNING] Class org.antlr.runtime.Token not found - continuing with a
 
  stub.
 
  [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with
  a
  stub.
  [ERROR]
while compiling:
 
   /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
  sql/hive/HiveQl.scala
 
   during phase: typer
library version: version 2.10.4
   compiler version: version 2.10.4
 
  The above shows incompatible changes between 0.12 and 0.13.1
  e.g. the first error corresponds to the following method
  in CommandProcessorFactory :
 public static 

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread jerryye
Hi Patrick,
Here's the process:
java -cp
/root/ephemeral-hdfs/conf/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly-1.1.1-SNAPSHOT-hadoop1.0.4.jar
-XX:MaxPermSize=128m -Djava.library.path=/root/ephemeral-hdfs/lib/native/
-Xms5g -Xmx10g -XX:MaxPermSize=10g -Dspark.akka.timeout=300
-Dspark.driver.port=59156 -Xms5g -Xmx10g -XX:MaxPermSize=10g -Xms58315M
-Xmx58315M org.apache.spark.executor.CoarseGrainedExecutorBackend
akka.tcp://sp...@ip-10-226-198-178.us-west-2.compute.internal:59156/user/CoarseGrainedScheduler
5 ip-10-38-9-181.us-west-2.compute.internal 8
akka.tcp://sparkwor...@ip-10-38-9-181.us-west-2.compute.internal:34533/user/Worker
app-20140825214225-0001

Attached is the requested stack trace.



On Mon, Aug 25, 2014 at 1:35 PM, Patrick Wendell [via Apache Spark
Developers List] ml-node+s1001551n8001...@n3.nabble.com wrote:

 One other idea - when things freeze up, try to run jstack on the spark
 shell process and on the executors and attach the results. It could be
 that
 somehow you are encountering a deadlock somewhere.


 On Mon, Aug 25, 2014 at 1:26 PM, Matei Zaharia [hidden email]
 http://user/SendEmail.jtp?type=nodenode=8001i=0
 wrote:

  Was the original issue with Spark 1.1 (i.e. master branch) or an earlier
  release?
 
  One possibility is that your S3 bucket is in a remote Amazon region,
 which
  would make it very slow. In my experience though saveAsTextFile has
 worked
  even for pretty large datasets in that situation, so maybe there's
  something else in your job causing a problem. Have you tried other
  operations on the data, like count(), or saving synthetic datasets (e.g.
  sc.parallelize(1 to 100*1000*1000, 20).saveAsTextFile(...)?
 
  Matei
 
  On August 25, 2014 at 12:09:25 PM, amnonkhen ([hidden email]
 http://user/SendEmail.jtp?type=nodenode=8001i=1) wrote:
 
  Hi jerryye,
  Maybe if you voted up my question on Stack Overflow it would get some
  traction and we would get nearer to a solution.
  Thanks,
  Amnon
 
 
 
  --
  View this message in context:
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p7991.html
  Sent from the Apache Spark Developers List mailing list archive at
  Nabble.com.
 
  -
  To unsubscribe, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=8001i=2
  For additional commands, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=8001i=3
 
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8001.html
  To start a new topic under Apache Spark Developers List, email
 ml-node+s1001551n1...@n3.nabble.com
 To unsubscribe from Apache Spark Developers List, click here
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=amVycnl5ZUBnbWFpbC5jb218MXwtNTI4OTc1MTAz
 .
 NAML
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



jstack.txt (92K) 
http://apache-spark-developers-list.1001551.n3.nabble.com/attachment/8006/0/jstack.txt




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8006.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-25 Thread Graham Dennis
Hi,

Unless you manually patched Spark, if you have Reynold’s patch for SPARK-2878, 
you also have the patch for SPARK-2893 which makes the underlying cause much 
more obvious and explicit.  So the below is unlikely to be related to 
SPARK-2878.

Graham

On 26 Aug 2014, at 4:13 am, npanj nitinp...@gmail.com wrote:

 I am running the code with @rxin's patch in standalone mode.  In my case I am
 registering org.apache.spark.graphx.GraphKryoRegistrator . 
 
 Recently I started to see com.esotericsoftware.kryo.KryoException:
 java.io.IOException: failed to uncompress the chunk: PARSING_ERROR . Has
 anyone seen this? Could it be related to this issue?  Here it trace: 
 --
 vids (org.apache.spark.graphx.impl.VertexAttributeBlock)
com.esotericsoftware.kryo.io.Input.fill(Input.java:142)
com.esotericsoftware.kryo.io.Input.require(Input.java:169)
com.esotericsoftware.kryo.io.Input.readLong_slow(Input.java:710)
com.esotericsoftware.kryo.io.Input.readLong(Input.java:665)
 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$LongArraySerializer.read(DefaultArraySerializers.java:127)
 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$LongArraySerializer.read(DefaultArraySerializers.java:107)
com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 
 org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1054)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 
 org.apache.spark.graphx.impl.VertexPartitionBaseOps.innerJoinKeepLeft(VertexPartitionBaseOps.scala:192)
 
 org.apache.spark.graphx.impl.EdgePartition.updateVertices(EdgePartition.scala:78)
 
 org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:75)
 
 org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:73)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 
 org.apache.spark.graphx.EdgeRDD$$anonfun$mapEdgePartitions$1.apply(EdgeRDD.scala:87)
 
 org.apache.spark.graphx.EdgeRDD$$anonfun$mapEdgePartitions$1.apply(EdgeRDD.scala:85)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:202)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 
 --
 
 
 
 
 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-2878-Kryo-serialisation-with-custom-Kryo-registrator-failing-tp7719p7989.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread amnonkhen
Hi Matei,
The original issue happened on a spark-1.0.2-bin-hadoop2 installation.
I will try the synthetic operation and see if I get the same results or not.
Amnon


On Mon, Aug 25, 2014 at 11:26 PM, Matei Zaharia [via Apache Spark
Developers List] ml-node+s1001551n8000...@n3.nabble.com wrote:

 Was the original issue with Spark 1.1 (i.e. master branch) or an earlier
 release?

 One possibility is that your S3 bucket is in a remote Amazon region, which
 would make it very slow. In my experience though saveAsTextFile has worked
 even for pretty large datasets in that situation, so maybe there's
 something else in your job causing a problem. Have you tried other
 operations on the data, like count(), or saving synthetic datasets (e.g.
 sc.parallelize(1 to 100*1000*1000, 20).saveAsTextFile(...)?

 Matei

 On August 25, 2014 at 12:09:25 PM, amnonkhen ([hidden email]
 http://user/SendEmail.jtp?type=nodenode=8000i=0) wrote:

 Hi jerryye,
 Maybe if you voted up my question on Stack Overflow it would get some
 traction and we would get nearer to a solution.
 Thanks,
 Amnon



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p7991.html

 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=8000i=1
 For additional commands, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=8000i=2



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8000.html
  To unsubscribe from saveAsTextFile to s3 on spark does not work, just
 hangs, click here
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=7795code=YW1ub24uaXNAZ21haWwuY29tfDc3OTV8LTkxODIwMjYzNg==
 .
 NAML
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8008.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
My problem is that I'm not sure this workaround would solve things, given the 
issue described here (where there was a lot of memory free but it didn't get 
re-offered). If you think it does, it would be good to explain why it behaves 
like that.

Matei

On August 25, 2014 at 2:28:18 PM, Timothy Chen (tnac...@gmail.com) wrote:

Hi Matei, 

I'm going to investigate from both Mesos and Spark side will hopefully 
have a good long term solution. In the mean time having a work around 
to start with is going to unblock folks. 

Tim 

On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote: 
 Anyway it would be good if someone from the Mesos side investigates this and 
 proposes a solution. The 32 MB per task hack isn't completely foolproof 
 either (e.g. people might allocate all the RAM to their executor and thus 
 stop being able to launch tasks), so maybe we wait on a Mesos fix for this 
 one. 
 
 Matei 
 
 On August 25, 2014 at 1:07:15 PM, Matei Zaharia (matei.zaha...@gmail.com) 
 wrote: 
 
 This is kind of weird then, seems perhaps unrelated to this issue (or at 
 least to the way I understood it). Is the problem maybe that Mesos saw 0 MB 
 being freed and didn't re-offer the machine *even though there was more than 
 32 MB free overall*? 
 
 Matei 
 
 On August 25, 2014 at 12:59:59 PM, Cody Koeninger (c...@koeninger.org) 
 wrote: 
 
 I definitely saw a case where 
 
 a. the only job running was a 256m shell 
 b. I started a 2g job 
 c. a little while later the same user as in a started another 256m shell 
 
 My job immediately stopped making progress. Once user a killed his shells, 
 it started again. 
 
 This is on nodes with ~15G of memory, on which we have successfully run 8G 
 jobs. 
 
 
 On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com 
 wrote: 
 
 BTW it seems to me that even without that patch, you should be getting 
 tasks launched as long as you leave at least 32 MB of memory free on each 
 machine (that is, the sum of the executor memory sizes is not exactly the 
 same as the total size of the machine). Then Mesos will be able to re-offer 
 that machine whenever CPUs free up. 
 
 Matei 
 
 On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com) 
 wrote: 
 
 We have not tried the work-around because there are other bugs in there 
 that affected our set-up, though it seems it would help. 
 
 
 On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote: 
 
  +1 to have the work around in. 
  
  I'll be investigating from the Mesos side too. 
  
  Tim 
  
  On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com 
  wrote: 
   Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's 
   too 
  bad that this happens in fine-grained mode -- would be really good to 
  fix. 
  I'll see if we can get the workaround in 
  https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally 
  have you tried that? 
   
   Matei 
   
   On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com) 
  wrote: 
   
   Hi Matei, 
   
   We have an analytics team that uses the cluster on a daily basis. They 
  use two types of 'run modes': 
   
   1) For running actual queries, they set the spark.executor.memory to 
  something between 4 and 8GB of RAM/worker. 
   
   2) A shell that takes a minimal amount of memory on workers (128MB) 
   for 
  prototyping out a larger query. This allows them to not take up RAM on 
  the 
  cluster when they do not really need it. 
   
   We see the deadlocks when there are a few shells in either case. From 
  the usage patterns we have, coarse-grained mode would be a challenge as 
  we 
  have to constantly remind people to kill their shells as soon as their 
  queries finish. 
   
   Am I correct in viewing Mesos in coarse-grained mode as being similar 
   to 
  Spark Standalone's cpu allocation behavior? 
   
   
   
   
   On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia 
   matei.zaha...@gmail.com 
  wrote: 
   Hey Gary, just as a workaround, note that you can use Mesos in 
  coarse-grained mode by setting spark.mesos.coarse=true. Then it will 
  hold 
  onto CPUs for the duration of the job. 
   
   Matei 
   
   On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) 
  wrote: 
   
   I just wanted to bring up a significant Mesos/Spark issue that makes 
   the 
   combo difficult to use for teams larger than 4-5 people. It's covered 
   in 
   https://issues.apache.org/jira/browse/MESOS-1688. My understanding is 
  that 
   Spark's use of executors in fine-grained mode is a very different 
  behavior 
   than many of the other common frameworks for Mesos. 
   
  
 
 


Re: [Spark SQL] off-heap columnar store

2014-08-25 Thread Henry Saputra
Hi Michael,

This is great news.
Any initial proposal or design about the caching to Tachyon that you
can share so far?

I don't think there is a JIRA ticket open to track this feature yet.

- Henry

On Mon, Aug 25, 2014 at 1:13 PM, Michael Armbrust
mich...@databricks.com wrote:

 What is the plan for getting Tachyon/off-heap support for the columnar
 compressed store?  It's not in 1.1 is it?


 It is not in 1.1 and there are not concrete plans for adding it at this
 point.  Currently, there is more engineering investment going into caching
 parquet data in Tachyon instead.  This approach is going to have much
 better support for nested data, leverages other work being done on parquet,
 and alleviates your concerns about wire format compatibility.

 That said, if someone really wants to try and implement it, I don't think
 it would be very hard.  The primary issue is going to be designing a clean
 interface that is not too tied to this one implementation.


 Also, how likely is the wire format for the columnar compressed data
 to change?  That would be a problem for write-through or persistence.


 We aren't making any guarantees at the moment that it won't change.  Its
 currently only intended for temporary caching of data.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Matei Zaharia
Got it. Another thing that would help is if you spot any exceptions or failed 
tasks in the web UI (http://driver:4040).

Matei

On August 25, 2014 at 3:07:41 PM, amnonkhen (amnon...@gmail.com) wrote:

Hi Matei, 
The original issue happened on a spark-1.0.2-bin-hadoop2 installation. 
I will try the synthetic operation and see if I get the same results or not. 
Amnon 


On Mon, Aug 25, 2014 at 11:26 PM, Matei Zaharia [via Apache Spark 
Developers List] ml-node+s1001551n8000...@n3.nabble.com wrote: 

 Was the original issue with Spark 1.1 (i.e. master branch) or an earlier 
 release? 
 
 One possibility is that your S3 bucket is in a remote Amazon region, which 
 would make it very slow. In my experience though saveAsTextFile has worked 
 even for pretty large datasets in that situation, so maybe there's 
 something else in your job causing a problem. Have you tried other 
 operations on the data, like count(), or saving synthetic datasets (e.g. 
 sc.parallelize(1 to 100*1000*1000, 20).saveAsTextFile(...)? 
 
 Matei 
 
 On August 25, 2014 at 12:09:25 PM, amnonkhen ([hidden email] 
 http://user/SendEmail.jtp?type=nodenode=8000i=0) wrote: 
 
 Hi jerryye, 
 Maybe if you voted up my question on Stack Overflow it would get some 
 traction and we would get nearer to a solution. 
 Thanks, 
 Amnon 
 
 
 
 -- 
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p7991.html
  
 
 Sent from the Apache Spark Developers List mailing list archive at 
 Nabble.com. 
 
 - 
 To unsubscribe, e-mail: [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=8000i=1 
 For additional commands, e-mail: [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=8000i=2 
 
 
 
 -- 
 If you reply to this email, your message will be added to the discussion 
 below: 
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8000.html
  
 To unsubscribe from saveAsTextFile to s3 on spark does not work, just 
 hangs, click here 
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=7795code=YW1ub24uaXNAZ21haWwuY29tfDc3OTV8LTkxODIwMjYzNg==
  
 . 
 NAML 
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
  
 




-- 
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8008.html
 
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Mesos/Spark Deadlock

2014-08-25 Thread Timothy Chen
I don't think it solves Cody's problem which still need more
investigating, but I believe it does solve the problem you described
earlier.

I just confirmed with Mesos folks that we no longer need the minimum
memory requirement so we'll be dropping that soon and the workaround
might not be needed for the next mesos release.

Tim

On Mon, Aug 25, 2014 at 3:06 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 My problem is that I'm not sure this workaround would solve things, given
 the issue described here (where there was a lot of memory free but it didn't
 get re-offered). If you think it does, it would be good to explain why it
 behaves like that.

 Matei

 On August 25, 2014 at 2:28:18 PM, Timothy Chen (tnac...@gmail.com) wrote:

 Hi Matei,

 I'm going to investigate from both Mesos and Spark side will hopefully
 have a good long term solution. In the mean time having a work around
 to start with is going to unblock folks.

 Tim

 On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 Anyway it would be good if someone from the Mesos side investigates this
 and
 proposes a solution. The 32 MB per task hack isn't completely foolproof
 either (e.g. people might allocate all the RAM to their executor and thus
 stop being able to launch tasks), so maybe we wait on a Mesos fix for this
 one.

 Matei

 On August 25, 2014 at 1:07:15 PM, Matei Zaharia (matei.zaha...@gmail.com)
 wrote:

 This is kind of weird then, seems perhaps unrelated to this issue (or at
 least to the way I understood it). Is the problem maybe that Mesos saw 0
 MB
 being freed and didn't re-offer the machine *even though there was more
 than
 32 MB free overall*?

 Matei

 On August 25, 2014 at 12:59:59 PM, Cody Koeninger (c...@koeninger.org)
 wrote:

 I definitely saw a case where

 a. the only job running was a 256m shell
 b. I started a 2g job
 c. a little while later the same user as in a started another 256m shell

 My job immediately stopped making progress. Once user a killed his shells,
 it started again.

 This is on nodes with ~15G of memory, on which we have successfully run 8G
 jobs.


 On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

 BTW it seems to me that even without that patch, you should be getting
 tasks launched as long as you leave at least 32 MB of memory free on each
 machine (that is, the sum of the executor memory sizes is not exactly the
 same as the total size of the machine). Then Mesos will be able to
 re-offer
 that machine whenever CPUs free up.

 Matei

 On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:

 We have not tried the work-around because there are other bugs in there
 that affected our set-up, though it seems it would help.


 On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:

  +1 to have the work around in.
 
  I'll be investigating from the Mesos side too.
 
  Tim
 
  On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia
  matei.zaha...@gmail.com
  wrote:
   Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's
   too
  bad that this happens in fine-grained mode -- would be really good to
  fix.
  I'll see if we can get the workaround in
  https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally
  have you tried that?
  
   Matei
  
   On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com)
  wrote:
  
   Hi Matei,
  
   We have an analytics team that uses the cluster on a daily basis.
   They
  use two types of 'run modes':
  
   1) For running actual queries, they set the spark.executor.memory to
  something between 4 and 8GB of RAM/worker.
  
   2) A shell that takes a minimal amount of memory on workers (128MB)
   for
  prototyping out a larger query. This allows them to not take up RAM on
  the
  cluster when they do not really need it.
  
   We see the deadlocks when there are a few shells in either case. From
  the usage patterns we have, coarse-grained mode would be a challenge as
  we
  have to constantly remind people to kill their shells as soon as their
  queries finish.
  
   Am I correct in viewing Mesos in coarse-grained mode as being similar
   to
  Spark Standalone's cpu allocation behavior?
  
  
  
  
   On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia
   matei.zaha...@gmail.com
  wrote:
   Hey Gary, just as a workaround, note that you can use Mesos in
  coarse-grained mode by setting spark.mesos.coarse=true. Then it will
  hold
  onto CPUs for the duration of the job.
  
   Matei
  
   On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
  wrote:
  
   I just wanted to bring up a significant Mesos/Spark issue that makes
   the
   combo difficult to use for teams larger than 4-5 people. It's covered
   in
   https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
  that
   Spark's use of executors in fine-grained mode is a very different
  behavior
   than many of the other common frameworks for Mesos.
  
 



too many CancelledKeyException throwed from ConnectionManager

2014-08-25 Thread yao
Hi Folks,

We are testing our home-made KMeans algorithm using Spark on Yarn.
Recently, we've found that the application failed frequently when doing
clustering over 300,000,000 users (each user is represented by a feature
vector and the whole data set is around 600,000,000). After digging into
the job log, we've found that there are many CancelledKeyException throwed
by ConnectionManager but not observed other exceptions. We double frequent
CancelledKeyException brings the whole application down since the
application often failed on the third or fourth iteration for large
datasets. Welcome to any directional suggestions.

*Errors in job log*:
java.nio.channels.CancelledKeyException
at
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:363)
at
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:116)
14/08/25 19:04:32 INFO ConnectionManager: Removing ReceivingConnection to
ConnectionManagerId(lsv-289.rfiserve.net,43199)
14/08/25 19:04:32 ERROR ConnectionManager: Corresponding
SendingConnectionManagerId not found
14/08/25 19:04:32 INFO ConnectionManager: Key not valid ?
sun.nio.ch.SelectionKeyImpl@2570cd62
14/08/25 19:04:32 INFO ConnectionManager: key already cancelled ?
sun.nio.ch.SelectionKeyImpl@2570cd62
java.nio.channels.CancelledKeyException
at
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:363)
at
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:116)
14/08/25 19:04:32 INFO ConnectionManager: Removing ReceivingConnection to
ConnectionManagerId(lsv-289.rfiserve.net,56727)
14/08/25 19:04:32 INFO ConnectionManager: Removing SendingConnection to
ConnectionManagerId(lsv-289.rfiserve.net,56727)
14/08/25 19:04:32 INFO ConnectionManager: Removing SendingConnection to
ConnectionManagerId(lsv-289.rfiserve.net,56727)
14/08/25 19:04:32 INFO ConnectionManager: Key not valid ?
sun.nio.ch.SelectionKeyImpl@37c8b85a
14/08/25 19:04:32 INFO ConnectionManager: key already cancelled ?
sun.nio.ch.SelectionKeyImpl@37c8b85a
java.nio.channels.CancelledKeyException
at
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:287)
at
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:116)
14/08/25 19:04:32 INFO ConnectionManager: Removing SendingConnection to
ConnectionManagerId(lsv-668.rfiserve.net,41913)
14/08/25 19:04:32 INFO ConnectionManager: Removing ReceivingConnection to
ConnectionManagerId(lsv-668.rfiserve.net,41913)
14/08/25 19:04:32 INFO ConnectionManager: Key not valid ?
sun.nio.ch.SelectionKeyImpl@fcea3a4
14/08/25 19:04:32 ERROR ConnectionManager: Corresponding
SendingConnectionManagerId not found
14/08/25 19:04:32 INFO ConnectionManager: key already cancelled ?
sun.nio.ch.SelectionKeyImpl@fcea3a4


Best
Shengzhe


Re: Graphx seems to be broken while Creating a large graph(6B nodes in my case)

2014-08-25 Thread Ankur Dave
I posted the fix on the JIRA ticket 
(https://issues.apache.org/jira/browse/SPARK-3190). To update the user list, 
this is indeed an integer overflow problem when summing up the partition sizes. 
The fix is to use Longs for the sum: https://github.com/apache/spark/pull/2106.

Ankur


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Handling stale PRs

2014-08-25 Thread Nicholas Chammas
Check this out:
https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc

We're hitting close to 300 open PRs. Those are the least recently updated
ones.

I think having a low number of stale (i.e. not recently updated) PRs is a
good thing to shoot for. It doesn't leave contributors hanging (which feels
bad for contributors), and reduces project clutter (which feels bad for
maintainers/committers).

What is our approach to tackling this problem?

I think communicating and enforcing a clear policy on how stale PRs are
handled might be a good way to reduce the number of stale PRs we have
without making contributors feel rejected.

I don't know what such a policy would look like, but it should be
enforceable and lightweight--i.e. it shouldn't feel like a hammer used to
reject people's work, but rather a necessary tool to keep the project's
contributions relevant and manageable.

Nick


RDD replication in Spark

2014-08-25 Thread rapelly kartheek
Hi,

I've exercised multiple options available for persist() including  RDD
replication. I have gone thru the classes that involve in caching/storing
the RDDS at different levels. StorageLevel class plays a pivotal role by
recording whether to use memory or disk or to replicate the RDD on multiple
nodes.
The class LocationIterator iterates over the preferred machines one by one  for
each partition that is replicated. I got a rough idea of CoalescedRDD.
Please correct me if I am wrong.

But I am looking for the code that chooses the resources to replicate the
RDDs. Can someone please tell me how replication takes place and how do we
choose the resources for replication. I just want to know as to where
should I look into to understand how the replication happens.



Thank you so much!!!

regards

-Karthik


Re: Handling stale PRs

2014-08-25 Thread Matei Zaharia
Hey Nicholas,

In general we've been looking at these periodically (at least I have) and 
asking people to close out of date ones, but it's true that the list has gotten 
fairly large. We should probably have an expiry time of a few months and close 
them automatically. I agree that it's daunting to see so many open PRs.

Matei

On August 25, 2014 at 9:03:09 PM, Nicholas Chammas (nicholas.cham...@gmail.com) 
wrote:

Check this out: 
https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc 

We're hitting close to 300 open PRs. Those are the least recently updated 
ones. 

I think having a low number of stale (i.e. not recently updated) PRs is a 
good thing to shoot for. It doesn't leave contributors hanging (which feels 
bad for contributors), and reduces project clutter (which feels bad for 
maintainers/committers). 

What is our approach to tackling this problem? 

I think communicating and enforcing a clear policy on how stale PRs are 
handled might be a good way to reduce the number of stale PRs we have 
without making contributors feel rejected. 

I don't know what such a policy would look like, but it should be 
enforceable and lightweight--i.e. it shouldn't feel like a hammer used to 
reject people's work, but rather a necessary tool to keep the project's 
contributions relevant and manageable. 

Nick 


Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Amnon Khen
There were no failures nor exceptions.


On Tue, Aug 26, 2014 at 1:31 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 Got it. Another thing that would help is if you spot any exceptions or
 failed tasks in the web UI (http://driver:4040).

 Matei

 On August 25, 2014 at 3:07:41 PM, amnonkhen (amnon...@gmail.com) wrote:

 Hi Matei,
 The original issue happened on a spark-1.0.2-bin-hadoop2 installation.
 I will try the synthetic operation and see if I get the same results or
 not.
 Amnon


 On Mon, Aug 25, 2014 at 11:26 PM, Matei Zaharia [via Apache Spark
 Developers List] ml-node+s1001551n8000...@n3.nabble.com wrote:

  Was the original issue with Spark 1.1 (i.e. master branch) or an earlier
  release?
 
  One possibility is that your S3 bucket is in a remote Amazon region,
 which
  would make it very slow. In my experience though saveAsTextFile has
 worked
  even for pretty large datasets in that situation, so maybe there's
  something else in your job causing a problem. Have you tried other
  operations on the data, like count(), or saving synthetic datasets (e.g.
  sc.parallelize(1 to 100*1000*1000, 20).saveAsTextFile(...)?
 
  Matei
 
  On August 25, 2014 at 12:09:25 PM, amnonkhen ([hidden email]
  http://user/SendEmail.jtp?type=nodenode=8000i=0) wrote:
 
  Hi jerryye,
  Maybe if you voted up my question on Stack Overflow it would get some
  traction and we would get nearer to a solution.
  Thanks,
  Amnon
 
 
 
  --
  View this message in context:
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p7991.html
 
  Sent from the Apache Spark Developers List mailing list archive at
  Nabble.com.
 
  -
  To unsubscribe, e-mail: [hidden email]
  http://user/SendEmail.jtp?type=nodenode=8000i=1
  For additional commands, e-mail: [hidden email]
  http://user/SendEmail.jtp?type=nodenode=8000i=2
 
 
 
  --
  If you reply to this email, your message will be added to the discussion
  below:
 
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8000.html
  To unsubscribe from saveAsTextFile to s3 on spark does not work, just
  hangs, click here
  
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=7795code=YW1ub24uaXNAZ21haWwuY29tfDc3OTV8LTkxODIwMjYzNg==

  .
  NAML
  
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

 




 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8008.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.




Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Patrick Wendell
Hey Amnon,

So just to make sure I understand - you also saw the same issue with 1.0.2?
Just asking because whether or not this regresses the 1.0.2 behavior is
important for our own bug tracking.

- Patrick


On Mon, Aug 25, 2014 at 10:22 PM, Amnon Khen amnon...@gmail.com wrote:

 There were no failures nor exceptions.


 On Tue, Aug 26, 2014 at 1:31 AM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  Got it. Another thing that would help is if you spot any exceptions or
  failed tasks in the web UI (http://driver:4040).
 
  Matei
 
  On August 25, 2014 at 3:07:41 PM, amnonkhen (amnon...@gmail.com) wrote:
 
  Hi Matei,
  The original issue happened on a spark-1.0.2-bin-hadoop2 installation.
  I will try the synthetic operation and see if I get the same results or
  not.
  Amnon
 
 
  On Mon, Aug 25, 2014 at 11:26 PM, Matei Zaharia [via Apache Spark
  Developers List] ml-node+s1001551n8000...@n3.nabble.com wrote:
 
   Was the original issue with Spark 1.1 (i.e. master branch) or an
 earlier
   release?
  
   One possibility is that your S3 bucket is in a remote Amazon region,
  which
   would make it very slow. In my experience though saveAsTextFile has
  worked
   even for pretty large datasets in that situation, so maybe there's
   something else in your job causing a problem. Have you tried other
   operations on the data, like count(), or saving synthetic datasets
 (e.g.
   sc.parallelize(1 to 100*1000*1000, 20).saveAsTextFile(...)?
  
   Matei
  
   On August 25, 2014 at 12:09:25 PM, amnonkhen ([hidden email]
   http://user/SendEmail.jtp?type=nodenode=8000i=0) wrote:
  
   Hi jerryye,
   Maybe if you voted up my question on Stack Overflow it would get some
   traction and we would get nearer to a solution.
   Thanks,
   Amnon
  
  
  
   --
   View this message in context:
  
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p7991.html
  
   Sent from the Apache Spark Developers List mailing list archive at
   Nabble.com.
  
   -
   To unsubscribe, e-mail: [hidden email]
   http://user/SendEmail.jtp?type=nodenode=8000i=1
   For additional commands, e-mail: [hidden email]
   http://user/SendEmail.jtp?type=nodenode=8000i=2
  
  
  
   --
   If you reply to this email, your message will be added to the
 discussion
   below:
  
  
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8000.html
   To unsubscribe from saveAsTextFile to s3 on spark does not work, just
   hangs, click here
   
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=7795code=YW1ub24uaXNAZ21haWwuY29tfDc3OTV8LTkxODIwMjYzNg==
 
 
   .
   NAML
   
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 
  
 
 
 
 
  --
  View this message in context:
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795p8008.html
  Sent from the Apache Spark Developers List mailing list archive at
  Nabble.com.