[jira] [Updated] (SPARK-5821) CTAS command failure when your don't have write permission of the parent directory

2015-02-15 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-5821:

Summary: CTAS command failure when your don't have write permission of the 
parent directory  (was: [SQL] CTAS command failure when your don't have write 
permission of the parent directory)

 CTAS command failure when your don't have write permission of the parent 
 directory
 --

 Key: SPARK-5821
 URL: https://issues.apache.org/jira/browse/SPARK-5821
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Yanbo Liang

 When you run CTAS command such as
 CREATE TEMPORARY TABLE jsonTable
 USING org.apache.spark.sql.json.DefaultSource
 OPTIONS (
 path /a/b/c/d
 ) AS
 SELECT a, b FROM jt,
 you will run into failure if you don't have write permission for directory 
 /a/b/c whether d is a directory or file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5821) JSONRelation should check if delete is successful for the overwrite operation.

2015-02-15 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-5821:

Summary: JSONRelation should check if delete is successful for the 
overwrite operation.  (was: CTAS command failure when your don't have write 
permission of the parent directory)

 JSONRelation should check if delete is successful for the overwrite operation.
 --

 Key: SPARK-5821
 URL: https://issues.apache.org/jira/browse/SPARK-5821
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Yanbo Liang

 When you run CTAS command such as
 CREATE TEMPORARY TABLE jsonTable
 USING org.apache.spark.sql.json.DefaultSource
 OPTIONS (
 path /a/b/c/d
 ) AS
 SELECT a, b FROM jt,
 you will run into failure if you don't have write permission for directory 
 /a/b/c whether d is a directory or file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5738) Reuse mutable row for each record at jsonStringToRow

2015-02-15 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-5738:

Summary: Reuse mutable row for each record at jsonStringToRow  (was: [SQL] 
Reuse mutable row for each record at jsonStringToRow)

 Reuse mutable row for each record at jsonStringToRow
 

 Key: SPARK-5738
 URL: https://issues.apache.org/jira/browse/SPARK-5738
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.3.0
Reporter: Yanbo Liang

 Other table scan like operations include ParquetTableScan, HiveTableScan use 
 a reusable mutable row for seralization to decrease garbage. We also make 
 JSONRelation#buildScan() with this optimization.
 When serialize json string to row, reuse a mutable row for both each record 
 and inner nested structure instead of creating a new one for each. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5821) [SQL] CTAS command failure when your don't have write permission of the parent directory

2015-02-15 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-5821:

Target Version/s: 1.3.0

 [SQL] CTAS command failure when your don't have write permission of the 
 parent directory
 

 Key: SPARK-5821
 URL: https://issues.apache.org/jira/browse/SPARK-5821
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Yanbo Liang

 When you run CTAS command such as
 CREATE TEMPORARY TABLE jsonTable
 USING org.apache.spark.sql.json.DefaultSource
 OPTIONS (
 path /a/b/c/d
 ) AS
 SELECT a, b FROM jt,
 you will run into failure if you don't have write permission for directory 
 /a/b/c whether d is a directory or file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5825) Failure stopping Services while command line argument is too long

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321839#comment-14321839
 ] 

Apache Spark commented on SPARK-5825:
-

User 'chenghao-intel' has created a pull request for this issue:
https://github.com/apache/spark/pull/4611

 Failure stopping Services while command line argument is too long
 -

 Key: SPARK-5825
 URL: https://issues.apache.org/jira/browse/SPARK-5825
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Reporter: Cheng Hao

 Stopping service in `spark-daemon.sh` will confirm the process id by fuzzy 
 matching the class name, however, it will fail if the java process arguments 
 is very long (greater than 4096).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321862#comment-14321862
 ] 

Littlestar commented on SPARK-5826:
---

I put some txt files into /testspark/watchdir, It throws  NullPointerException

15/02/15 16:18:20 WARN dstream.FileInputDStream: Error finding new files
java.lang.NullPointerException
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$fn$3$1.apply(JavaStreamingContext.scala:329)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$fn$3$1.apply(JavaStreamingContext.scala:329)
at 
org.apache.spark.streaming.dstream.FileInputDStream.org$apache$spark$streaming$dstream$FileInputDStream$$isNewFile(FileInputDStream.scala:215)
at 
org.apache.spark.streaming.dstream.FileInputDStream$$anon$3.accept(FileInputDStream.scala:172)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1489)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1523)
at 
org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:174)
at 
org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:132)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:285)
at 
org.apache.spark.streaming.dstream.MappedDStream.compute(MappedDStream.scala:35)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:285)
at 
org.apache.spark.streaming.dstream.ShuffledDStream.compute(ShuffledDStream.scala:41)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:285)
at 
org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at 
org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:232)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:230)
at scala.util.Try$.apply(Try.scala:161)
at 
org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:230)
at 
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:167)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
at 

[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-15 Thread Florian Verhein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321764#comment-14321764
 ] 

Florian Verhein commented on SPARK-5813:


INAL but here are my thoughts:

The user ends up downloading it from Oracle and accepting the license terms in 
that process, so as long as they are (or made) aware then I don't really see a 
problem. It's just providing a mechanism for them to do this. i.e. It's not a 
redistribution issue.
I think a reasonable solution to this would be to have OpenJDK as the default, 
with OracleJDK as an option that the user must specifically request (and the 
option's documentation indicating that this entails acceptance of a license... 
etc)

At least, *the above is true in the case where the user builds their own AMI 
(that's the approach I take since it best suits my requirements). With provided 
AMIs I think this is more complex, because I would assume that is 
redistribution*. I guess that applies to any software that is put on the AMI 
actually... so this may be an issue that needs looking at more generally... 
I don't know how to best approach that case other than adhering to any 
redistribution terms  including these as part of an EULA for spark-ec2/AMIs or 
something? 

But with the work [~nchammas] has done, I suppose the easiest way would be to 
provide the public AMIs with OpenJDK, and add an option to build ones with 
OracleJDK if the user is inclined to do this themselves.
 
Hmmm... is this worthwhile?

 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)
Littlestar created SPARK-5826:
-

 Summary: JavaStreamingContext.fileStream cause Configuration 
NotSerializableException
 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor


org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
directory, ClassLongWritable kClass, ClassText vClass, 
ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
newFilesOnly, Configuration conf)

I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.

but it throw strange exception.

java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
at 
org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at 
org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
at 
org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
at 
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 

[jira] [Commented] (SPARK-5816) Add huge backward compatibility warning in DriverWrapper

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321902#comment-14321902
 ] 

Apache Spark commented on SPARK-5816:
-

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/4612

 Add huge backward compatibility warning in DriverWrapper
 

 Key: SPARK-5816
 URL: https://issues.apache.org/jira/browse/SPARK-5816
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Spark Core
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Critical

 As of Spark 1.3, we provide backward and forward compatibility in standalone 
 cluster mode through the REST submission gateway. HOWEVER, it nevertheless 
 goes through the legacy o.a.s.deploy.DriverWrapper, and the semantics of the 
 command line arguments there must not change. For instance, this was broken 
 in commit 20a6013106b56a1a1cc3e8cda092330ffbe77cc3.
 There is currently no warning against that in the class and so we should add 
 one before it's too late.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5826:
--
Attachment: TestStream.java

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
 Attachments: TestStream.java


 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at 

[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-02-15 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321777#comment-14321777
 ] 

Yi Zhou commented on SPARK-5791:


For the same input dataset size, it costs about ~2mins on hive on M/R with 
optimization parameters but it costs about ~1hour on SparkSQL.

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou

 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Eric O. LEBIGOT (EOL) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321776#comment-14321776
 ] 

Eric O. LEBIGOT (EOL) edited comment on SPARK-5820 at 2/15/15 1:48 AM:
---

Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host).

The JVM uses the proxy settings from my default browser (as indicated in the 
Java OS X Preferences).

I just tried another setting: Use proxy server (Advanced: For all protocols, on 
and off), Bypass proxy server for local addresses. This does not work either.


was (Author: lebigot):
Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host).

The JVM uses the proxy settings from my default browser (as indicated in the 
Java OS X Preferences). I will try with other settings.

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
 at 
 org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Driver stacktrace:
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
 at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at 

[jira] [Comment Edited] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Eric O. LEBIGOT (EOL) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321776#comment-14321776
 ] 

Eric O. LEBIGOT (EOL) edited comment on SPARK-5820 at 2/15/15 2:20 AM:
---

Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host).

In the Java OS X Preferences, I had set the to JVM use the proxy settings from 
my default browser (default), which is to use the system-wide proxy setting 
(from AppleSystem PreferencesNetworkAdvancedProxies).

I just tried settings, to no avail (they all fail with the same error):
- Use proxy server (Advanced: For all protocols, on and off), Bypass proxy 
server for local addresses.
- Direct connection.

It is strange that even the last one fails.


was (Author: lebigot):
Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host).

In the Java OS X Preferences, I had set the JVM uses the proxy settings from 
my default browser (default).

I just tried settings, to no avail (they all fail with the same error):
- Use proxy server (Advanced: For all protocols, on and off), Bypass proxy 
server for local addresses.
- Direct connection.

It is strange that even the last one fails.

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
 at 
 org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Driver stacktrace:
 at 
 

[jira] [Comment Edited] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Eric O. LEBIGOT (EOL) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321776#comment-14321776
 ] 

Eric O. LEBIGOT (EOL) edited comment on SPARK-5820 at 2/15/15 2:26 AM:
---

Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host, with localhost bypassed by the 
proxy, as it should be—the goal is mostly to get access to Google).

In the Java OS X Preferences, I had set the to JVM use the proxy settings from 
my default browser (default), which is to use the system-wide proxy setting 
(from AppleSystem PreferencesNetworkAdvancedProxies).

I just tried the following other JVM settings, to no avail (they all fail with 
the same error):
- Use proxy server (Advanced: For all protocols, on and off), Bypass proxy 
server for local addresses.
- Direct connection.

It is strange that even the last one fails.


was (Author: lebigot):
Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host, with localhost bypassed by the 
proxy, as it should be).

In the Java OS X Preferences, I had set the to JVM use the proxy settings from 
my default browser (default), which is to use the system-wide proxy setting 
(from AppleSystem PreferencesNetworkAdvancedProxies).

I just tried the following other JVM settings, to no avail (they all fail with 
the same error):
- Use proxy server (Advanced: For all protocols, on and off), Bypass proxy 
server for local addresses.
- Direct connection.

It is strange that even the last one fails.

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
 at 
 org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 at 
 

[jira] [Updated] (SPARK-5738) [SQL] Reuse mutable row for each record at jsonStringToRow

2015-02-15 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-5738:

Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-3700

 [SQL] Reuse mutable row for each record at jsonStringToRow
 --

 Key: SPARK-5738
 URL: https://issues.apache.org/jira/browse/SPARK-5738
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.3.0
Reporter: Yanbo Liang

 Other table scan like operations include ParquetTableScan, HiveTableScan use 
 a reusable mutable row for seralization to decrease garbage. We also make 
 JSONRelation#buildScan() with this optimization.
 When serialize json string to row, reuse a mutable row for both each record 
 and inner nested structure instead of creating a new one for each. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321942#comment-14321942
 ] 

Sean Owen commented on SPARK-5813:
--

I kind of misstated this. I think this issue is more fundamentally one of 
distribution. I don't believe others are entitled to redistribute Oracle's 
JDK/JRE. So I don't think Spark can provide AMIs that contain the Oracle 
implementation. 

Providing tools to help someone build an AMI with Oracle JDK is different. 
However there too I don't think you can actively hide and agree to the license 
agreement, or slip in what you think is an equivalent license agreement 
process. It's not our call to make.

Dumb question, are AMIs being hosted and redistributed by the Spark project? I 
wasn't aware of these if so. Whoever does, yes, needs to think about what 
software licensing terms mean for redistribution. It's perhaps surprising to 
most people, and an artifact of history, that these OSS licenses kick in almost 
solely when you distribute, not use, the software!

Anyway: every installer that I've seen that provides the Oracle JDK is a 
wrapper around their downloader and EULA script. You could embed that process 
in a script, if you dare. My hunch is that it's not worth the trouble, if 
there's no obvious demand or motivation.

 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5480) GraphX pageRank: java.lang.ArrayIndexOutOfBoundsException:

2015-02-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321950#comment-14321950
 ] 

Sean Owen commented on SPARK-5480:
--

It doesn't look 100% like the same issue, but have a look at SPARK-1329 too

 GraphX pageRank: java.lang.ArrayIndexOutOfBoundsException: 
 ---

 Key: SPARK-5480
 URL: https://issues.apache.org/jira/browse/SPARK-5480
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.2.0
 Environment: Yarn client
Reporter: Stephane Maarek

 Running the following code:
 val subgraph = graph.subgraph (
   vpred = (id,article) = //working predicate)
 ).cache()
 println( sSubgraph contains ${subgraph.vertices.count} nodes and 
 ${subgraph.edges.count} edges)
 val prGraph = subgraph.staticPageRank(5).cache
 val titleAndPrGraph = subgraph.outerJoinVertices(prGraph.vertices) {
   (v, title, rank) = (rank.getOrElse(0.0), title)
 }
 titleAndPrGraph.vertices.top(13) {
   Ordering.by((entry: (VertexId, (Double, _))) = entry._2._1)
 }.foreach(t = println(t._2._2._1 + :  + t._2._1 + , id: + t._1))
 Returns a graph with 5000 nodes and 4000 edges.
 Then it crashes during the PageRank with the following:
 15/01/29 05:51:07 INFO scheduler.TaskSetManager: Starting task 125.0 in stage 
 39.0 (TID 1808, *HIDDEN, PROCESS_LOCAL, 2059 bytes)
 15/01/29 05:51:07 WARN scheduler.TaskSetManager: Lost task 107.0 in stage 
 39.0 (TID 1794, *HIDDEN): java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.spark.graphx.util.collection.GraphXPrimitiveKeyOpenHashMap$mcJI$sp.apply$mcJI$sp(GraphXPrimitiveKeyOpenHashMap.scala:64)
 at 
 org.apache.spark.graphx.impl.EdgePartition.updateVertices(EdgePartition.scala:91)
 at 
 org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:75)
 at 
 org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:73)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at 
 org.apache.spark.graphx.impl.EdgeRDDImpl$$anonfun$mapEdgePartitions$1.apply(EdgeRDDImpl.scala:110)
 at 
 org.apache.spark.graphx.impl.EdgeRDDImpl$$anonfun$mapEdgePartitions$1.apply(EdgeRDDImpl.scala:108)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 at 
 

[jira] [Comment Edited] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Eric O. LEBIGOT (EOL) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321776#comment-14321776
 ] 

Eric O. LEBIGOT (EOL) edited comment on SPARK-5820 at 2/15/15 2:23 AM:
---

Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host, with localhost bypassed by the 
proxy, as it should be).

In the Java OS X Preferences, I had set the to JVM use the proxy settings from 
my default browser (default), which is to use the system-wide proxy setting 
(from AppleSystem PreferencesNetworkAdvancedProxies).

I just tried the following other JVM settings, to no avail (they all fail with 
the same error):
- Use proxy server (Advanced: For all protocols, on and off), Bypass proxy 
server for local addresses.
- Direct connection.

It is strange that even the last one fails.


was (Author: lebigot):
Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host, with localhost bypassed by the 
proxy, as it should).

In the Java OS X Preferences, I had set the to JVM use the proxy settings from 
my default browser (default), which is to use the system-wide proxy setting 
(from AppleSystem PreferencesNetworkAdvancedProxies).

I just tried the following other JVM settings, to no avail (they all fail with 
the same error):
- Use proxy server (Advanced: For all protocols, on and off), Bypass proxy 
server for local addresses.
- Direct connection.

It is strange that even the last one fails.

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
 at 
 org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 

[jira] [Comment Edited] (SPARK-5823) Reuse mutable rows for inner structures when parsing JSON objects

2015-02-15 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321792#comment-14321792
 ] 

Yanbo Liang edited comment on SPARK-5823 at 2/15/15 2:57 AM:
-

Hi [~yhuai]
Actually, I have implemented reusing the mutable row for inner structures at  
https://issues.apache.org/jira/browse/SPARK-5738.
However I have found that you have mentioned it will extend Spark SQL's JSON 
support to handle the case where each object in the dataset might have 
considerably different schema 
(https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html).
 In this scenario, the inner nested mutable row strategy will not take much 
performance improvements, am I right? 


was (Author: yanboliang):
Hi [~yhuai]
Actually, I have implemented reusing the mutable row for inner structures at  
https://issues.apache.org/jira/browse/SPARK-5738.
However I have found that you have mentioned it will extend Spark SQL's JSON 
support to handle the case where each object in the dataset might have 
considerably different schema. In this scenario, the inner nested mutable row 
strategy will not take much performance improvements, am I right? 

 Reuse mutable rows for inner structures when parsing JSON objects
 -

 Key: SPARK-5823
 URL: https://issues.apache.org/jira/browse/SPARK-5823
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai

 With SPARK-5738, we will reuse a mutable row for rows when parsing JSON 
 objects. We can do the same thing for inner structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5746) INSERT OVERWRITE throws FileNotFoundException when the source and destination point to the same table.

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321825#comment-14321825
 ] 

Apache Spark commented on SPARK-5746:
-

User 'yanbohappy' has created a pull request for this issue:
https://github.com/apache/spark/pull/4610

 INSERT OVERWRITE throws FileNotFoundException when the source and destination 
 point to the same table.
 --

 Key: SPARK-5746
 URL: https://issues.apache.org/jira/browse/SPARK-5746
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Blocker

 With the newly introduced write support of data source API, {{JSONRelation}} 
 and {{ParquetRelation2}} both suffer this bug.
 The root cause is that we removed the source table before insertion 
 ([here|https://github.com/apache/spark/blob/1ac099e3e00ddb01af8e6e3a84c70f8363f04b5c/sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala#L112-L121]).
 The correct solution should be first insert into a temporary folder, and then 
 overwrite the source table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-02-15 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321777#comment-14321777
 ] 

Yi Zhou edited comment on SPARK-5791 at 2/15/15 1:49 AM:
-

For the same input data set size(e.g.,1TB), it costs about ~2mins on hive on 
M/R with optimization parameters but it costs about ~1hour on SparkSQL.


was (Author: jameszhouyi):
For the same input dataset size, it costs about ~2mins on hive on M/R with 
optimization parameters but it costs about ~1hour on SparkSQL.

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou

 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Eric O. LEBIGOT (EOL) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321776#comment-14321776
 ] 

Eric O. LEBIGOT (EOL) commented on SPARK-5820:
--

Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host).

The JVM uses the proxy settings from my default browser (as indicated in the 
Java OS X Preferences). I will try with other settings.

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
 at 
 org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Driver stacktrace:
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
 at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
 at scala.Option.foreach(Option.scala:245)
 at 
 

[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5826:
--
Attachment: (was: TestStream.java)

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
 Attachments: TestStream.java


 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at 

[jira] [Commented] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321848#comment-14321848
 ] 

Littlestar commented on SPARK-5795:
---

works for me, thanks.

 api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
 -

 Key: SPARK-5795
 URL: https://issues.apache.org/jira/browse/SPARK-5795
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Critical
 Attachments: TestStreamCompile.java


 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 the following code can't compile on java.
 JavaPairDStreamInteger, Integer rs =
 rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, 
 TextOutputFormat.class, jobConf);
 but similar code in JavaPairRDD works ok.
 JavaPairRDDString, String counts =...
 counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, 
 TextOutputFormat.class, jobConf);
 
 mybe the 
   def saveAsNewAPIHadoopFiles(
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }
 =
 def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[F],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-925) Allow ec2 scripts to load default options from a json file

2015-02-15 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321810#comment-14321810
 ] 

Nicholas Chammas commented on SPARK-925:


Loading config from a file seems like a good thing to have and matches what 
comparable tools like Ubuntu Juju and MIT StarCluster do.

However, I would favor a format other than JSON since JSON doesn't allow you to 
comment stuff out. I've used tools with JSON-backed config files, and they are 
super annoying to deal with if you try to alternate invocations between version 
A with this line uncommented and version B with the same line commented out.

YAML seems like a better choice for this task. What do you think [~shayping]?

cc [~shivaram]

 Allow ec2 scripts to load default options from a json file
 --

 Key: SPARK-925
 URL: https://issues.apache.org/jira/browse/SPARK-925
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Affects Versions: 0.8.0
Reporter: Shay Seng
Priority: Minor

 The option list for ec2 script can be a little irritating to type in, 
 especially things like path to identity-file, region , zone, ami etc.
 It would be nice if ec2 script looks for an options.json file in the 
 following order: (1) PWD, (2) ~/spark-ec2, (3) same dir as spark_ec2.py
 Something like:
 def get_defaults_from_options():
   # Check to see if a options.json file exists, if so load it. 
   # However, values in the options.json file can only overide values in opts
   # if the Opt values are None or 
   # i.e. commandline options take presidence 
   defaults = 
 {'aws-access-key-id':'','aws-secret-access-key':'','key-pair':'', 
 'identity-file':'', 'region':'ap-southeast-1', 'zone':'', 
 'ami':'','slaves':1, 'instance-type':'m1.large'}
   # Look for options.json in directory cluster was called from
   # Had to modify the spark_ec2 wrapper script since it mangles the pwd
   startwd = os.environ['STARTWD']
   if os.path.exists(os.path.join(startwd,options.json)):
   optionspath = os.path.join(startwd,options.json)
   else:
   optionspath = os.path.join(os.getcwd(),options.json)
   
   try:
 print Loading options file: , optionspath  
 with open (optionspath) as json_data:
 jdata = json.load(json_data)
 for k in jdata:
   defaults[k]=jdata[k]
   except IOError:
 print 'Warning: options.json file not loaded'
   # Check permissions on identity-file, if defined, otherwise launch will 
 fail late and will be irritating
   if defaults['identity-file']!='':
 st = os.stat(defaults['identity-file'])
 user_can_read = bool(st.st_mode  stat.S_IRUSR)
 grp_perms = bool(st.st_mode  stat.S_IRWXG)
 others_perm = bool(st.st_mode  stat.S_IRWXO)
 if (not user_can_read):
   print No read permission to read , defaults['identify-file']
   sys.exit(1)
 if (grp_perms or others_perm):
   print Permissions are too open, please chmod 600 file , 
 defaults['identify-file']
   sys.exit(1)
   # if defaults contain AWS access id or private key, set it to environment. 
   # required for use with boto to access the AWS console 
   if defaults['aws-access-key-id'] != '':
 os.environ['AWS_ACCESS_KEY_ID']=defaults['aws-access-key-id'] 
   if defaults['aws-secret-access-key'] != '':   
 os.environ['AWS_SECRET_ACCESS_KEY'] = defaults['aws-secret-access-key']
   return defaults  
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321901#comment-14321901
 ] 

Saisai Shao commented on SPARK-5826:


Quite curious why DStream#checkpoint() calls {{persist}} internally to cache 
the RDD, rather than checkpointing RDD, any comments [~tdas]? Thanks a lot.

{code}
  def checkpoint(interval: Duration): DStream[T] = {
if (isInitialized) {
  throw new UnsupportedOperationException(
Cannot change checkpoint interval of an DStream after streaming 
context has started)
}
persist()
checkpointDuration = interval
this
  }
{code}

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
 Attachments: TestStream.java


 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 

[jira] [Created] (SPARK-5823) Reuse mutable rows for inner structures when parsing JSON objects

2015-02-15 Thread Yin Huai (JIRA)
Yin Huai created SPARK-5823:
---

 Summary: Reuse mutable rows for inner structures when parsing JSON 
objects
 Key: SPARK-5823
 URL: https://issues.apache.org/jira/browse/SPARK-5823
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai


With SPARK-5738, we will reuse a mutable row for rows when parsing JSON 
objects. We can do the same thing for inner structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5824) CTAS should set null format in hive-0.13.1

2015-02-15 Thread Adrian Wang (JIRA)
Adrian Wang created SPARK-5824:
--

 Summary: CTAS should set null format in hive-0.13.1
 Key: SPARK-5824
 URL: https://issues.apache.org/jira/browse/SPARK-5824
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5637) Expose spark_ec2 as as StarCluster Plugin

2015-02-15 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5637.
--
Resolution: Won't Fix

I think this is simply something that should be done outside the core project.

 Expose spark_ec2 as as StarCluster Plugin
 -

 Key: SPARK-5637
 URL: https://issues.apache.org/jira/browse/SPARK-5637
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Alex Rothberg
Priority: Minor

 Starcluster has a lot features in place for stating EC2 instances and it 
 would be great to have an option to leverage that as a plugin.
 See: http://star.mit.edu/cluster/docs/latest/index.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5816) Add huge backward compatibility warning in DriverWrapper

2015-02-15 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321979#comment-14321979
 ] 

Saisai Shao edited comment on SPARK-5816 at 2/15/15 1:27 PM:
-

Sorry for wrong link, I just mess the JIRA id. Really sorry about this, seems I 
cannot delete the comments.


was (Author: jerryshao):
Sorry for wrong link, I just mess the JIRA id.

 Add huge backward compatibility warning in DriverWrapper
 

 Key: SPARK-5816
 URL: https://issues.apache.org/jira/browse/SPARK-5816
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Spark Core
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Critical

 As of Spark 1.3, we provide backward and forward compatibility in standalone 
 cluster mode through the REST submission gateway. HOWEVER, it nevertheless 
 goes through the legacy o.a.s.deploy.DriverWrapper, and the semantics of the 
 command line arguments there must not change. For instance, this was broken 
 in commit 20a6013106b56a1a1cc3e8cda092330ffbe77cc3.
 There is currently no warning against that in the class and so we should add 
 one before it's too late.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Eric O. LEBIGOT (EOL) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322022#comment-14322022
 ] 

Eric O. LEBIGOT (EOL) edited comment on SPARK-5820 at 2/15/15 3:25 PM:
---

Thank you for investigating this.

The proxy server (localhost) does not require authentication.

The example (SparkPi) does work when no SOCKS proxy is used (in the OS X 
Network Preferences).

I did not mean to say that my proxy doesn't work for any JVM-based process (I'm 
not sure how I could test this). :) If you were referring to the three JVM 
settings that I tried (with the local SOCKS proxy server active), they are 
simply those of the OS X Java Preferences (AppleSystem 
PreferencesJavaNetwork Settings). For all JVM proxy settings, 
./bin/run-example SparkPi fails in the same way (as in the original report).

Here is how to reproduce the problem:
1) Creation of the local SOCKS proxy: ssh -g -N -D 12345 login@remotehost
2) Activation of the proxy in OS X: AppleSystem 
PreferencesNetworkAdvancedProxies, select SOCKS Proxy, set SOCKS Proxy 
Server to localhost, port 12345.

Now, browser connections go through the proxy (this should show in 
http://www.whatismyip.com/, for example).


was (Author: lebigot):
Thank you for investigating this.

The proxy server (localhost) does not require authentication.

The example (SparkPi) does work when no SOCKS proxy is used (in the OS X 
Network Preferences).

I did not mean to say that my proxy doesn't work for any JVM-based process (I'm 
not sure how I could test this). :) If you were referring to the three JVM 
settings that I tried (with the local SOCKS proxy server active), they are 
simply those of the OS X Java Preferences (AppleSystem 
PreferencesJavaNetwork Settings). For all JVM proxy settings, 
./bin/run-example SparkPi fails in the same way (as in the original report).

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
 at 
 

[jira] [Commented] (SPARK-5827) Add missing imports in the example of SQLContext

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321988#comment-14321988
 ] 

Apache Spark commented on SPARK-5827:
-

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/4615

 Add missing imports in the example of SQLContext
 

 Key: SPARK-5827
 URL: https://issues.apache.org/jira/browse/SPARK-5827
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Reporter: Takeshi Yamamuro
Priority: Trivial





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5816) Add huge backward compatibility warning in DriverWrapper

2015-02-15 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321979#comment-14321979
 ] 

Saisai Shao commented on SPARK-5816:


Sorry for wrong link, I just mess the JIRA id.

 Add huge backward compatibility warning in DriverWrapper
 

 Key: SPARK-5816
 URL: https://issues.apache.org/jira/browse/SPARK-5816
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Spark Core
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Critical

 As of Spark 1.3, we provide backward and forward compatibility in standalone 
 cluster mode through the REST submission gateway. HOWEVER, it nevertheless 
 goes through the legacy o.a.s.deploy.DriverWrapper, and the semantics of the 
 command line arguments there must not change. For instance, this was broken 
 in commit 20a6013106b56a1a1cc3e8cda092330ffbe77cc3.
 There is currently no warning against that in the class and so we should add 
 one before it's too late.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5827) Add missing imports in the example of SQLContext

2015-02-15 Thread Takeshi Yamamuro (JIRA)
Takeshi Yamamuro created SPARK-5827:
---

 Summary: Add missing imports in the example of SQLContext
 Key: SPARK-5827
 URL: https://issues.apache.org/jira/browse/SPARK-5827
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Reporter: Takeshi Yamamuro
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Eric O. LEBIGOT (EOL) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322022#comment-14322022
 ] 

Eric O. LEBIGOT (EOL) edited comment on SPARK-5820 at 2/15/15 3:27 PM:
---

Thank you for investigating this.

The proxy server (localhost) does not require authentication.

The example (SparkPi) does work when no SOCKS proxy is used (in the OS X 
Network Preferences).

I did not mean to say that my proxy doesn't work for any JVM-based process (I'm 
not sure how I could test this). :) If you were referring to the three JVM 
settings that I tried (with the local SOCKS proxy server active), they are 
simply those of the OS X Java Preferences (AppleSystem 
PreferencesJavaNetwork Settings). For all JVM proxy settings, 
./bin/run-example SparkPi fails in the same way (as in the original report).

Here is how to reproduce the problem:
1) Creation of the local SOCKS proxy: ssh -g -N -D 12345 login@remotehost
2) Activation of the proxy in OS X: AppleSystem 
PreferencesNetworkAdvancedProxies, select SOCKS Proxy, set SOCKS Proxy 
Server to localhost, port 12345, make sure that Bypassed hosts contains 
localhost (and 127.0.0.1, for good measure, maybe).

Now, connections go through the proxy (this should show in 
http://www.whatismyip.com/, for example).


was (Author: lebigot):
Thank you for investigating this.

The proxy server (localhost) does not require authentication.

The example (SparkPi) does work when no SOCKS proxy is used (in the OS X 
Network Preferences).

I did not mean to say that my proxy doesn't work for any JVM-based process (I'm 
not sure how I could test this). :) If you were referring to the three JVM 
settings that I tried (with the local SOCKS proxy server active), they are 
simply those of the OS X Java Preferences (AppleSystem 
PreferencesJavaNetwork Settings). For all JVM proxy settings, 
./bin/run-example SparkPi fails in the same way (as in the original report).

Here is how to reproduce the problem:
1) Creation of the local SOCKS proxy: ssh -g -N -D 12345 login@remotehost
2) Activation of the proxy in OS X: AppleSystem 
PreferencesNetworkAdvancedProxies, select SOCKS Proxy, set SOCKS Proxy 
Server to localhost, port 12345.

Now, browser connections go through the proxy (this should show in 
http://www.whatismyip.com/, for example).

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
 at 
 

[jira] [Comment Edited] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Eric O. LEBIGOT (EOL) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322022#comment-14322022
 ] 

Eric O. LEBIGOT (EOL) edited comment on SPARK-5820 at 2/15/15 3:28 PM:
---

Thank you for investigating this.

The proxy server (localhost) does not require authentication.

The example (SparkPi) does work when no SOCKS proxy is used (in the OS X 
Network Preferences).

I did not mean to say that my proxy doesn't work for any JVM-based process (I'm 
not sure how I could test this). :) If you were referring to the three JVM 
settings that I tried (with the local SOCKS proxy server active), they are 
simply those of the OS X Java Preferences (AppleSystem 
PreferencesJavaNetwork Settings). For all JVM proxy settings, 
./bin/run-example SparkPi fails in the same way (as in the original report).

Here is how to reproduce the problem:
1) Creation of the local SOCKS proxy: ssh -g -N -D 12345 login@remotehost
2) Activation of the proxy in OS X: AppleSystem 
PreferencesNetworkAdvancedProxies, select SOCKS Proxy, set SOCKS Proxy 
Server to localhost, port 12345, make sure that Bypassed hosts contains 
localhost (and 127.0.0.1, for good measure, maybe).

Now, connections go through the proxy (this should show in 
http://www.whatismyip.com/, for example)… and the SparkPi example fails.


was (Author: lebigot):
Thank you for investigating this.

The proxy server (localhost) does not require authentication.

The example (SparkPi) does work when no SOCKS proxy is used (in the OS X 
Network Preferences).

I did not mean to say that my proxy doesn't work for any JVM-based process (I'm 
not sure how I could test this). :) If you were referring to the three JVM 
settings that I tried (with the local SOCKS proxy server active), they are 
simply those of the OS X Java Preferences (AppleSystem 
PreferencesJavaNetwork Settings). For all JVM proxy settings, 
./bin/run-example SparkPi fails in the same way (as in the original report).

Here is how to reproduce the problem:
1) Creation of the local SOCKS proxy: ssh -g -N -D 12345 login@remotehost
2) Activation of the proxy in OS X: AppleSystem 
PreferencesNetworkAdvancedProxies, select SOCKS Proxy, set SOCKS Proxy 
Server to localhost, port 12345, make sure that Bypassed hosts contains 
localhost (and 127.0.0.1, for good measure, maybe).

Now, connections go through the proxy (this should show in 
http://www.whatismyip.com/, for example).

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 

[jira] [Commented] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321980#comment-14321980
 ] 

Apache Spark commented on SPARK-5826:
-

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/4612

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
 Attachments: TestStream.java


 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 

[jira] [Commented] (SPARK-3340) Deprecate ADD_JARS and ADD_FILES

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322020#comment-14322020
 ] 

Apache Spark commented on SPARK-3340:
-

User 'azagrebin' has created a pull request for this issue:
https://github.com/apache/spark/pull/4616

 Deprecate ADD_JARS and ADD_FILES
 

 Key: SPARK-3340
 URL: https://issues.apache.org/jira/browse/SPARK-3340
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Spark Core
Affects Versions: 1.1.0
Reporter: Andrew Or
  Labels: starter

 These were introduced before Spark submit even existed. Now that there are 
 many better ways of setting jars and python files through Spark submit, we 
 should deprecate these environment variables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Eric O. LEBIGOT (EOL) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322022#comment-14322022
 ] 

Eric O. LEBIGOT (EOL) commented on SPARK-5820:
--

Thank you for investigating this.

The proxy server (localhost) does not require authentication.

The example (SparkPi) does work when no SOCKS proxy is used (in the OS X 
Network Preferences).

I did not mean to say that my proxy doesn't work for any JVM-based process (I'm 
not sure how I could test this). :) If you were referring to the three JVM 
settings that I tried (with the local SOCKS proxy server active), they are 
simply those of the OS X Java Preferences (AppleSystem 
PreferencesJavaNetwork Settings). For all JVM proxy settings, 
./bin/run-example SparkPi fails in the same way (as in the original report).

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
 at 
 org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Driver stacktrace:
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
 at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)

[jira] [Resolved] (SPARK-5827) Add missing imports in the example of SQLContext

2015-02-15 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5827.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4615
[https://github.com/apache/spark/pull/4615]

 Add missing imports in the example of SQLContext
 

 Key: SPARK-5827
 URL: https://issues.apache.org/jira/browse/SPARK-5827
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Reporter: Takeshi Yamamuro
Priority: Trivial
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5827) Add missing imports in the example of SQLContext

2015-02-15 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5827:
-
Assignee: Takeshi Yamamuro

 Add missing imports in the example of SQLContext
 

 Key: SPARK-5827
 URL: https://issues.apache.org/jira/browse/SPARK-5827
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Reporter: Takeshi Yamamuro
Assignee: Takeshi Yamamuro
Priority: Trivial
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5820:
-
Component/s: Examples
   Priority: Minor  (was: Major)

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
  Components: Examples
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)
Priority: Minor

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
 at 
 org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Driver stacktrace:
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
 at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
 at scala.Option.foreach(Option.scala:245)
 at 
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
 at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 

[jira] [Commented] (SPARK-5745) Allow to use custom TaskMetrics implementation

2015-02-15 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322091#comment-14322091
 ] 

Patrick Wendell commented on SPARK-5745:


Hey [~jlewandowski] - TaskMetrics are a mostly internal concept. In fact, there 
isn't really any nice framework for aggregation internally. We instead have a 
bunch of manual aggregation in various places.

The primary user-facing API we have aggregated counters are accumulators. Are 
there features lacking from accumulators that make it difficult for you to use 
them for your use case?

 Allow to use custom TaskMetrics implementation
 --

 Key: SPARK-5745
 URL: https://issues.apache.org/jira/browse/SPARK-5745
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Reporter: Jacek Lewandowski

 There can be various RDDs implemented and the {{TaskMetrics}} provides a 
 great API for collecting metrics and aggregating them. However some RDDs may 
 want to register some custom metrics and the current implementation doesn't 
 allow for this (for example the number of read rows or whatever).
 I suppose that this can be changed without modifying the whole interface - 
 there could used some factory to create the initial {{TaskMetrics}} object. 
 The default factory could be overridden by user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5828) Dynamic partition pattern support

2015-02-15 Thread Jianshi Huang (JIRA)
Jianshi Huang created SPARK-5828:


 Summary: Dynamic partition pattern support
 Key: SPARK-5828
 URL: https://issues.apache.org/jira/browse/SPARK-5828
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Jianshi Huang


Hi,

HCatalog allows you to specify the pattern of paths for partitions, which will 
be used by dynamic partition loading.

  
https://cwiki.apache.org/confluence/display/Hive/HCatalog+DynamicPartitions#HCatalogDynamicPartitions-ExternalTables

Can we have similar feature in SparkSQL?

Thanks,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5502) User guide for isotonic regression

2015-02-15 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-5502.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

 User guide for isotonic regression
 --

 Key: SPARK-5502
 URL: https://issues.apache.org/jira/browse/SPARK-5502
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, MLlib
Reporter: Xiangrui Meng
Assignee: Martin Zapletal
 Fix For: 1.3.0


 Add user guide to docs/mllib-regression.md with code examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2015-02-15 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-1867:
-
Component/s: Spark Core

 Spark Documentation Error causes java.lang.IllegalStateException: unread 
 block data
 ---

 Key: SPARK-1867
 URL: https://issues.apache.org/jira/browse/SPARK-1867
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: sam

 I've employed two System Administrators on a contract basis (for quite a bit 
 of money), and both contractors have independently hit the following 
 exception.  What we are doing is:
 1. Installing Spark 0.9.1 according to the documentation on the website, 
 along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
 2. Building a fat jar with a Spark app with sbt then trying to run it on the 
 cluster
 I've also included code snippets, and sbt deps at the bottom.
 When I've Googled this, there seems to be two somewhat vague responses:
 a) Mismatching spark versions on nodes/user code
 b) Need to add more jars to the SparkConf
 Now I know that (b) is not the problem having successfully run the same code 
 on other clusters while only including one jar (it's a fat jar).
 But I have no idea how to check for (a) - it appears Spark doesn't have any 
 version checks or anything - it would be nice if it checked versions and 
 threw a mismatching version exception: you have user code using version X 
 and node Y has version Z.
 I would be very grateful for advice on this.
 The exception:
 Exception in thread main org.apache.spark.SparkException: Job aborted: Task 
 0.0:1 failed 32 times (most recent failure: Exception failure: 
 java.lang.IllegalStateException: unread block data)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to 
 java.lang.IllegalStateException: unread block data [duplicate 59]
 My code snippet:
 val conf = new SparkConf()
.setMaster(clusterMaster)
.setAppName(appName)
.setSparkHome(sparkHome)
.setJars(SparkContext.jarOfClass(this.getClass))
 println(count =  + new SparkContext(conf).textFile(someHdfsPath).count())
 My SBT dependencies:
 // relevant
 org.apache.spark % spark-core_2.10 % 0.9.1,
 org.apache.hadoop % hadoop-client % 2.3.0-mr1-cdh5.0.0,
 // standard, probably unrelated
 com.github.seratch %% awscala % [0.2,),
 org.scalacheck %% scalacheck % 1.10.1 % test,
 org.specs2 %% specs2 % 1.14 % test,
 org.scala-lang % scala-reflect % 2.10.3,
 org.scalaz %% scalaz-core % 7.0.5,
 net.minidev % json-smart % 1.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5669) Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS

2015-02-15 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-5669:
-
Assignee: Sean Owen

 Spark assembly includes incompatibly licensed libgfortran, libgcc code via 
 JBLAS
 

 Key: SPARK-5669
 URL: https://issues.apache.org/jira/browse/SPARK-5669
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Blocker
 Fix For: 1.3.0


 Sorry for Blocker, but it's a license issue. The Spark assembly includes 
 the following, from JBLAS:
 {code}
 lib/
 lib/static/
 lib/static/Mac OS X/
 lib/static/Mac OS X/x86_64/
 lib/static/Mac OS X/x86_64/libjblas_arch_flavor.jnilib
 lib/static/Mac OS X/x86_64/sse3/
 lib/static/Mac OS X/x86_64/sse3/libjblas.jnilib
 lib/static/Windows/
 lib/static/Windows/x86/
 lib/static/Windows/x86/libgfortran-3.dll
 lib/static/Windows/x86/libgcc_s_dw2-1.dll
 lib/static/Windows/x86/jblas_arch_flavor.dll
 lib/static/Windows/x86/sse3/
 lib/static/Windows/x86/sse3/jblas.dll
 lib/static/Windows/amd64/
 lib/static/Windows/amd64/libgfortran-3.dll
 lib/static/Windows/amd64/jblas.dll
 lib/static/Windows/amd64/libgcc_s_sjlj-1.dll
 lib/static/Windows/amd64/jblas_arch_flavor.dll
 lib/static/Linux/
 lib/static/Linux/i386/
 lib/static/Linux/i386/sse3/
 lib/static/Linux/i386/sse3/libjblas.so
 lib/static/Linux/i386/libjblas_arch_flavor.so
 lib/static/Linux/amd64/
 lib/static/Linux/amd64/sse3/
 lib/static/Linux/amd64/sse3/libjblas.so
 lib/static/Linux/amd64/libjblas_arch_flavor.so
 {code}
 Unfortunately the libgfortran and libgcc libraries included for Windows are 
 not licensed in a way that's compatible with Spark and the AL2 -- LGPL at 
 least.
 It's easy to exclude them. I'm not clear what it does to running on Windows; 
 I assume it can still work but the libs would have to be made available 
 locally and put on the shared library path manually. I don't think there's a 
 package manager as in Linux that would make it easily available. I'm not able 
 to test on Windows.
 If it doesn't work, the follow-up question is whether that means JBLAS has to 
 be removed on the double, or treated as a known issue for 1.3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5669) Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS

2015-02-15 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-5669:
-
Target Version/s: 1.3.0, 1.1.2, 1.2.2

 Spark assembly includes incompatibly licensed libgfortran, libgcc code via 
 JBLAS
 

 Key: SPARK-5669
 URL: https://issues.apache.org/jira/browse/SPARK-5669
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Blocker
 Fix For: 1.3.0


 Sorry for Blocker, but it's a license issue. The Spark assembly includes 
 the following, from JBLAS:
 {code}
 lib/
 lib/static/
 lib/static/Mac OS X/
 lib/static/Mac OS X/x86_64/
 lib/static/Mac OS X/x86_64/libjblas_arch_flavor.jnilib
 lib/static/Mac OS X/x86_64/sse3/
 lib/static/Mac OS X/x86_64/sse3/libjblas.jnilib
 lib/static/Windows/
 lib/static/Windows/x86/
 lib/static/Windows/x86/libgfortran-3.dll
 lib/static/Windows/x86/libgcc_s_dw2-1.dll
 lib/static/Windows/x86/jblas_arch_flavor.dll
 lib/static/Windows/x86/sse3/
 lib/static/Windows/x86/sse3/jblas.dll
 lib/static/Windows/amd64/
 lib/static/Windows/amd64/libgfortran-3.dll
 lib/static/Windows/amd64/jblas.dll
 lib/static/Windows/amd64/libgcc_s_sjlj-1.dll
 lib/static/Windows/amd64/jblas_arch_flavor.dll
 lib/static/Linux/
 lib/static/Linux/i386/
 lib/static/Linux/i386/sse3/
 lib/static/Linux/i386/sse3/libjblas.so
 lib/static/Linux/i386/libjblas_arch_flavor.so
 lib/static/Linux/amd64/
 lib/static/Linux/amd64/sse3/
 lib/static/Linux/amd64/sse3/libjblas.so
 lib/static/Linux/amd64/libjblas_arch_flavor.so
 {code}
 Unfortunately the libgfortran and libgcc libraries included for Windows are 
 not licensed in a way that's compatible with Spark and the AL2 -- LGPL at 
 least.
 It's easy to exclude them. I'm not clear what it does to running on Windows; 
 I assume it can still work but the libs would have to be made available 
 locally and put on the shared library path manually. I don't think there's a 
 package manager as in Linux that would make it easily available. I'm not able 
 to test on Windows.
 If it doesn't work, the follow-up question is whether that means JBLAS has to 
 be removed on the double, or treated as a known issue for 1.3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5669) Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS

2015-02-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322084#comment-14322084
 ] 

Sean Owen commented on SPARK-5669:
--

Since the follow-up change is to remove JBLAS, and that's covered in 
SPARK-5814, shall we track the remaining work there?

 Spark assembly includes incompatibly licensed libgfortran, libgcc code via 
 JBLAS
 

 Key: SPARK-5669
 URL: https://issues.apache.org/jira/browse/SPARK-5669
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Blocker
 Fix For: 1.3.0


 Sorry for Blocker, but it's a license issue. The Spark assembly includes 
 the following, from JBLAS:
 {code}
 lib/
 lib/static/
 lib/static/Mac OS X/
 lib/static/Mac OS X/x86_64/
 lib/static/Mac OS X/x86_64/libjblas_arch_flavor.jnilib
 lib/static/Mac OS X/x86_64/sse3/
 lib/static/Mac OS X/x86_64/sse3/libjblas.jnilib
 lib/static/Windows/
 lib/static/Windows/x86/
 lib/static/Windows/x86/libgfortran-3.dll
 lib/static/Windows/x86/libgcc_s_dw2-1.dll
 lib/static/Windows/x86/jblas_arch_flavor.dll
 lib/static/Windows/x86/sse3/
 lib/static/Windows/x86/sse3/jblas.dll
 lib/static/Windows/amd64/
 lib/static/Windows/amd64/libgfortran-3.dll
 lib/static/Windows/amd64/jblas.dll
 lib/static/Windows/amd64/libgcc_s_sjlj-1.dll
 lib/static/Windows/amd64/jblas_arch_flavor.dll
 lib/static/Linux/
 lib/static/Linux/i386/
 lib/static/Linux/i386/sse3/
 lib/static/Linux/i386/sse3/libjblas.so
 lib/static/Linux/i386/libjblas_arch_flavor.so
 lib/static/Linux/amd64/
 lib/static/Linux/amd64/sse3/
 lib/static/Linux/amd64/sse3/libjblas.so
 lib/static/Linux/amd64/libjblas_arch_flavor.so
 {code}
 Unfortunately the libgfortran and libgcc libraries included for Windows are 
 not licensed in a way that's compatible with Spark and the AL2 -- LGPL at 
 least.
 It's easy to exclude them. I'm not clear what it does to running on Windows; 
 I assume it can still work but the libs would have to be made available 
 locally and put on the shared library path manually. I don't think there's a 
 package manager as in Linux that would make it easily available. I'm not able 
 to test on Windows.
 If it doesn't work, the follow-up question is whether that means JBLAS has to 
 be removed on the double, or treated as a known issue for 1.3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5826:
---
Priority: Critical  (was: Minor)

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Critical
 Attachments: TestStream.java


 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at 

[jira] [Commented] (SPARK-5669) Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS

2015-02-15 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322082#comment-14322082
 ] 

Xiangrui Meng commented on SPARK-5669:
--

PR #4453 resolves this issue for master and branch-1.3.

 Spark assembly includes incompatibly licensed libgfortran, libgcc code via 
 JBLAS
 

 Key: SPARK-5669
 URL: https://issues.apache.org/jira/browse/SPARK-5669
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Blocker
 Fix For: 1.3.0


 Sorry for Blocker, but it's a license issue. The Spark assembly includes 
 the following, from JBLAS:
 {code}
 lib/
 lib/static/
 lib/static/Mac OS X/
 lib/static/Mac OS X/x86_64/
 lib/static/Mac OS X/x86_64/libjblas_arch_flavor.jnilib
 lib/static/Mac OS X/x86_64/sse3/
 lib/static/Mac OS X/x86_64/sse3/libjblas.jnilib
 lib/static/Windows/
 lib/static/Windows/x86/
 lib/static/Windows/x86/libgfortran-3.dll
 lib/static/Windows/x86/libgcc_s_dw2-1.dll
 lib/static/Windows/x86/jblas_arch_flavor.dll
 lib/static/Windows/x86/sse3/
 lib/static/Windows/x86/sse3/jblas.dll
 lib/static/Windows/amd64/
 lib/static/Windows/amd64/libgfortran-3.dll
 lib/static/Windows/amd64/jblas.dll
 lib/static/Windows/amd64/libgcc_s_sjlj-1.dll
 lib/static/Windows/amd64/jblas_arch_flavor.dll
 lib/static/Linux/
 lib/static/Linux/i386/
 lib/static/Linux/i386/sse3/
 lib/static/Linux/i386/sse3/libjblas.so
 lib/static/Linux/i386/libjblas_arch_flavor.so
 lib/static/Linux/amd64/
 lib/static/Linux/amd64/sse3/
 lib/static/Linux/amd64/sse3/libjblas.so
 lib/static/Linux/amd64/libjblas_arch_flavor.so
 {code}
 Unfortunately the libgfortran and libgcc libraries included for Windows are 
 not licensed in a way that's compatible with Spark and the AL2 -- LGPL at 
 least.
 It's easy to exclude them. I'm not clear what it does to running on Windows; 
 I assume it can still work but the libs would have to be made available 
 locally and put on the shared library path manually. I don't think there's a 
 package manager as in Linux that would make it easily available. I'm not able 
 to test on Windows.
 If it doesn't work, the follow-up question is whether that means JBLAS has to 
 be removed on the double, or treated as a known issue for 1.3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5815) Deprecate SVDPlusPlus APIs that expose DoubleMatrix from JBLAS

2015-02-15 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-5815:
-
Assignee: Sean Owen  (was: Xiangrui Meng)

 Deprecate SVDPlusPlus APIs that expose DoubleMatrix from JBLAS
 --

 Key: SPARK-5815
 URL: https://issues.apache.org/jira/browse/SPARK-5815
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.3.0
Reporter: Xiangrui Meng
Assignee: Sean Owen

 It is generally bad to expose types defined in a 3rd-party package in Spark 
 public APIs. We should deprecate those methods in SVDPlusPlus and replace 
 them in the next release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5816) Add huge backward compatibility warning in DriverWrapper

2015-02-15 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5816:
-
Affects Version/s: 1.3.0

 Add huge backward compatibility warning in DriverWrapper
 

 Key: SPARK-5816
 URL: https://issues.apache.org/jira/browse/SPARK-5816
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Spark Core
Affects Versions: 1.3.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Critical

 As of Spark 1.3, we provide backward and forward compatibility in standalone 
 cluster mode through the REST submission gateway. HOWEVER, it nevertheless 
 goes through the legacy o.a.s.deploy.DriverWrapper, and the semantics of the 
 command line arguments there must not change. For instance, this was broken 
 in commit 20a6013106b56a1a1cc3e8cda092330ffbe77cc3.
 There is currently no warning against that in the class and so we should add 
 one before it's too late.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321761#comment-14321761
 ] 

Apache Spark commented on SPARK-5795:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/4608

 api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
 -

 Key: SPARK-5795
 URL: https://issues.apache.org/jira/browse/SPARK-5795
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Critical
 Attachments: TestStreamCompile.java


 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 the following code can't compile on java.
 JavaPairDStreamInteger, Integer rs =
 rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, 
 TextOutputFormat.class, jobConf);
 but similar code in JavaPairRDD works ok.
 JavaPairRDDString, String counts =...
 counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, 
 TextOutputFormat.class, jobConf);
 
 mybe the 
   def saveAsNewAPIHadoopFiles(
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }
 =
 def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[F],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321849#comment-14321849
 ] 

Littlestar edited comment on SPARK-5826 at 2/15/15 7:51 AM:


testcode upload.

It throw Exception every 2 seconds.

15/02/15 15:50:35 ERROR actor.OneForOneStrategy: 
org.apache.hadoop.conf.Configuration
java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
at 
org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at 
org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
at 
org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
at 
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


was (Author: cnstar9988):
testcode upload.
!TestStream.java!

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: 

[jira] [Commented] (SPARK-5824) CTAS should set null format in hive-0.13.1

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321806#comment-14321806
 ] 

Apache Spark commented on SPARK-5824:
-

User 'adrian-wang' has created a pull request for this issue:
https://github.com/apache/spark/pull/4609

 CTAS should set null format in hive-0.13.1
 --

 Key: SPARK-5824
 URL: https://issues.apache.org/jira/browse/SPARK-5824
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321848#comment-14321848
 ] 

Littlestar edited comment on SPARK-5795 at 2/15/15 8:05 AM:


I merge pull/4608 and rebuild, it works for me, thanks.


was (Author: cnstar9988):
works for me, thanks.

 api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
 -

 Key: SPARK-5795
 URL: https://issues.apache.org/jira/browse/SPARK-5795
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Critical
 Attachments: TestStreamCompile.java


 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 the following code can't compile on java.
 JavaPairDStreamInteger, Integer rs =
 rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, 
 TextOutputFormat.class, jobConf);
 but similar code in JavaPairRDD works ok.
 JavaPairRDDString, String counts =...
 counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, 
 TextOutputFormat.class, jobConf);
 
 mybe the 
   def saveAsNewAPIHadoopFiles(
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }
 =
 def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[F],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5826:
--
Attachment: TestStream.java

testcode upload.
!TestStream.java!

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
 Attachments: TestStream.java


 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at 

[jira] [Updated] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-15 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5795:
-
Priority: Critical  (was: Minor)

Yes, I've seen the same problem and been meaning to do something about it. It 
makes you do this to use {{JavaPairDStream}}:
https://github.com/OryxProject/oryx/blob/master/oryx-lambda/src/main/java/com/cloudera/oryx/lambda/BatchLayer.java#L187

So basically, this is how it's declared now:

{code}
  def saveAsNewAPIHadoopFiles(
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[_ : NewOutputFormat[_, _]],
  conf: Configuration = new Configuration) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass, conf)
  }
{code}

but this works, and is how it works in {{JavaPairRDD}}:

{code}
  def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F],
  conf: Configuration = new Configuration) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass, conf)
  }
{code}

I worry about an API change of course, but, I think the current API isn't 
directly callable, so seems OK to change.

For a simple demo, try compiling this:

{code}
JavaPairDStreamIntWritable,Text pds = null;

pds.saveAsNewAPIHadoopFiles(, , IntWritable.class, IntWritable.class, 
SequenceFileOutputFormat.class);
{code}

The change above makes it work. I'll open a PR. I bumped the priority based on 
my understanding of the issue.

 api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
 -

 Key: SPARK-5795
 URL: https://issues.apache.org/jira/browse/SPARK-5795
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Critical
 Attachments: TestStreamCompile.java


 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 the following code can't compile on java.
 JavaPairDStreamInteger, Integer rs =
 rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, 
 TextOutputFormat.class, jobConf);
 but similar code in JavaPairRDD works ok.
 JavaPairRDDString, String counts =...
 counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, 
 TextOutputFormat.class, jobConf);
 
 mybe the 
   def saveAsNewAPIHadoopFiles(
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }
 =
 def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[F],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5826:
--
Component/s: Streaming

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor

 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 

[jira] [Updated] (SPARK-5746) Check invalid cases for the write path of data source API

2015-02-15 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-5746:

Summary: Check invalid cases for the write path of data source API  (was: 
INSERT OVERWRITE throws FileNotFoundException when the source and destination 
point to the same table.)

 Check invalid cases for the write path of data source API
 -

 Key: SPARK-5746
 URL: https://issues.apache.org/jira/browse/SPARK-5746
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Blocker

 With the newly introduced write support of data source API, {{JSONRelation}} 
 and {{ParquetRelation2}} both suffer this bug.
 The root cause is that we removed the source table before insertion 
 ([here|https://github.com/apache/spark/blob/1ac099e3e00ddb01af8e6e3a84c70f8363f04b5c/sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala#L112-L121]).
 The correct solution should be first insert into a temporary folder, and then 
 overwrite the source table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5746) Check invalid cases for the write path of data source API

2015-02-15 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-5746:

Description: 
Right now, with the newly introduced write support of data source API, 
{{JSONRelation}} and {{ParquetRelation2}} both delete data first when the save 
mode is overwrite 
([here|https://github.com/apache/spark/blob/1ac099e3e00ddb01af8e6e3a84c70f8363f04b5c/sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala#L112-L121])
 and this behavior introduces issues when the destination table is an input 
table of the query. For example
{code}
INSERT OVERWRITE t SELECT * FROM t
{code}

We need to add an analysis rule to check cases that are invalid for the write 
path of data source API.

  was:
With the newly introduced write support of data source API, {{JSONRelation}} 
and {{ParquetRelation2}} both suffer this bug.

The root cause is that we removed the source table before insertion 
([here|https://github.com/apache/spark/blob/1ac099e3e00ddb01af8e6e3a84c70f8363f04b5c/sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala#L112-L121]).

The correct solution should be first insert into a temporary folder, and then 
overwrite the source table.


 Check invalid cases for the write path of data source API
 -

 Key: SPARK-5746
 URL: https://issues.apache.org/jira/browse/SPARK-5746
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Blocker

 Right now, with the newly introduced write support of data source API, 
 {{JSONRelation}} and {{ParquetRelation2}} both delete data first when the 
 save mode is overwrite 
 ([here|https://github.com/apache/spark/blob/1ac099e3e00ddb01af8e6e3a84c70f8363f04b5c/sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala#L112-L121])
  and this behavior introduces issues when the destination table is an input 
 table of the query. For example
 {code}
 INSERT OVERWRITE t SELECT * FROM t
 {code}
 We need to add an analysis rule to check cases that are invalid for the write 
 path of data source API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5746) Check invalid cases for the write path of data source API

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322273#comment-14322273
 ] 

Apache Spark commented on SPARK-5746:
-

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/4617

 Check invalid cases for the write path of data source API
 -

 Key: SPARK-5746
 URL: https://issues.apache.org/jira/browse/SPARK-5746
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Blocker

 Right now, with the newly introduced write support of data source API, 
 {{JSONRelation}} and {{ParquetRelation2}} both delete data first when the 
 save mode is overwrite 
 ([here|https://github.com/apache/spark/blob/1ac099e3e00ddb01af8e6e3a84c70f8363f04b5c/sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala#L112-L121])
  and this behavior introduces issues when the destination table is an input 
 table of the query. For example
 {code}
 INSERT OVERWRITE t SELECT * FROM t
 {code}
 We need to add an analysis rule to check cases that are invalid for the write 
 path of data source API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5823) Reuse mutable rows for inner structures when parsing JSON objects

2015-02-15 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321792#comment-14321792
 ] 

Yanbo Liang commented on SPARK-5823:


Hi [~yhuai]
Actually, I have implemented reusing the mutable row for inner structures at  
https://issues.apache.org/jira/browse/SPARK-5738.
However I have found that you have mentioned it will extend Spark SQL's JSON 
support to handle the case where each object in the dataset might have 
considerably different schema. In this scenario, the inner nested mutable row 
strategy will not take much performance improvements, am I right? 

 Reuse mutable rows for inner structures when parsing JSON objects
 -

 Key: SPARK-5823
 URL: https://issues.apache.org/jira/browse/SPARK-5823
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai

 With SPARK-5738, we will reuse a mutable row for rows when parsing JSON 
 objects. We can do the same thing for inner structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5821) [SQL] CTAS command failure when your don't have write permission of the parent directory

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321824#comment-14321824
 ] 

Apache Spark commented on SPARK-5821:
-

User 'yanbohappy' has created a pull request for this issue:
https://github.com/apache/spark/pull/4610

 [SQL] CTAS command failure when your don't have write permission of the 
 parent directory
 

 Key: SPARK-5821
 URL: https://issues.apache.org/jira/browse/SPARK-5821
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Yanbo Liang

 When you run CTAS command such as
 CREATE TEMPORARY TABLE jsonTable
 USING org.apache.spark.sql.json.DefaultSource
 OPTIONS (
 path /a/b/c/d
 ) AS
 SELECT a, b FROM jt,
 you will run into failure if you don't have write permission for directory 
 /a/b/c whether d is a directory or file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Eric O. LEBIGOT (EOL) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321776#comment-14321776
 ] 

Eric O. LEBIGOT (EOL) edited comment on SPARK-5820 at 2/15/15 1:50 AM:
---

Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host).

In the Java OS X Preferences, I had set the JVM uses the proxy settings from 
my default browser (default).

I just tried settings, to no avail (they all fail with the same error):
- Use proxy server (Advanced: For all protocols, on and off), Bypass proxy 
server for local addresses.
- Direct connection.

It is strange that even the last one fails.


was (Author: lebigot):
Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host).

The JVM uses the proxy settings from my default browser (as indicated in the 
Java OS X Preferences).

I just tried another setting: Use proxy server (Advanced: For all protocols, on 
and off), Bypass proxy server for local addresses. This does not work either.

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
 at 
 org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Driver stacktrace:
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
 at 
 

[jira] [Comment Edited] (SPARK-5820) Example does not work when using SOCKS proxy

2015-02-15 Thread Eric O. LEBIGOT (EOL) (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321776#comment-14321776
 ] 

Eric O. LEBIGOT (EOL) edited comment on SPARK-5820 at 2/15/15 2:23 AM:
---

Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host, with localhost bypassed by the 
proxy, as it should).

In the Java OS X Preferences, I had set the to JVM use the proxy settings from 
my default browser (default), which is to use the system-wide proxy setting 
(from AppleSystem PreferencesNetworkAdvancedProxies).

I just tried the following other JVM settings, to no avail (they all fail with 
the same error):
- Use proxy server (Advanced: For all protocols, on and off), Bypass proxy 
server for local addresses.
- Direct connection.

It is strange that even the last one fails.


was (Author: lebigot):
Good question, I should have added more details: I am running locally (and have 
a local SOCKS proxy connection to a remote host).

In the Java OS X Preferences, I had set the to JVM use the proxy settings from 
my default browser (default), which is to use the system-wide proxy setting 
(from AppleSystem PreferencesNetworkAdvancedProxies).

I just tried the following other JVM settings, to no avail (they all fail with 
the same error):
- Use proxy server (Advanced: For all protocols, on and off), Bypass proxy 
server for local addresses.
- Direct connection.

It is strange that even the last one fails.

 Example does not work when using SOCKS proxy
 

 Key: SPARK-5820
 URL: https://issues.apache.org/jira/browse/SPARK-5820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Eric O. LEBIGOT (EOL)

 When using a SOCKS proxy (on OS X 10.10.2), running even the basic example 
 ./bin/run-example SparkPi 10 fails.
 -- Partial log --
 15/02/14 23:23:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
 aborting job
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Cancelling stage 0
 15/02/14 23:23:00 INFO Executor: Executor is trying to kill task 1.0 in stage 
 0.0 (TID 1)
 15/02/14 23:23:00 INFO TaskSchedulerImpl: Stage 0 was cancelled
 15/02/14 23:23:00 INFO DAGScheduler: Job 0 failed: reduce at 
 SparkPi.scala:35, took 1.920223 s
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
 Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketException: 
 Malformed reply from SOCKS server
 at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.init(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1003)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:951)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:582)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:433)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
 at 
 org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
 at 
 org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 

[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-15 Thread Florian Verhein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322208#comment-14322208
 ] 

Florian Verhein commented on SPARK-5813:


Good point. I think you're right re: scripting away - I understand it's 
sometimes done by sysadmins/ops to automate their installation processes 
in-house, but that is a different situation. Thanks for that. 

spark_ec2 works by looking up an existing ami and using it to instantiate ec2 
instances. I don't know who currently maintains these. 



 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4865) Include temporary tables in SHOW TABLES

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322309#comment-14322309
 ] 

Apache Spark commented on SPARK-4865:
-

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/4618

 Include temporary tables in SHOW TABLES
 ---

 Key: SPARK-4865
 URL: https://issues.apache.org/jira/browse/SPARK-4865
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Misha Chernetsov
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322317#comment-14322317
 ] 

Littlestar commented on SPARK-5826:
---

I merge pull/4612 and rebuild, it works for me, no exception, thanks.



 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Critical
 Attachments: TestStream.java


 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 

[jira] [Created] (SPARK-5829) JavaStreamingContext.fileStream run task repeated empty when no more new files

2015-02-15 Thread Littlestar (JIRA)
Littlestar created SPARK-5829:
-

 Summary: JavaStreamingContext.fileStream run task repeated empty 
when no more new files
 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar


spark master (1.3.0) with SPARK-5826 patch.

JavaStreamingContext.fileStream run task repeated empty when no more new files

reproduce:
  1. mkdir /testspark/watchdir on HDFS.
  2. run app.
  3. put some text files into /testspark/watchdir.
every 30 seconds, spark log indicates that a new sub task runs.
and /testspark/resultdir/ has new directory with empty files every 30 seconds.

when no new files add, but it runs new task with empy rdd.

{noformat}
package my.test.hadoop.spark;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;

import scala.Tuple2;

public class TestStream {
@SuppressWarnings({ serial, resource })
public static void main(String[] args) throws Exception {

SparkConf conf = new SparkConf().setAppName(TestStream);
JavaStreamingContext jssc = new JavaStreamingContext(conf, 
Durations.seconds(30));
jssc.checkpoint(/testspark/checkpointdir);
Configuration jobConf = new Configuration();
jobConf.set(my.test.fields,fields);
JavaPairDStreamInteger, Integer is = 
jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
TextInputFormat.class, new FunctionPath, Boolean() {
@Override
public Boolean call(Path v1) throws Exception {
return true;
}
}, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
Text, Integer, Integer() {
@Override
public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
arg0) throws Exception {
return new Tuple2Integer, Integer(1, 1);
}
});

JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
Function2Integer, Integer, Integer() {
@Override
public Integer call(Integer arg0, Integer arg1) throws 
Exception {
return arg0 + arg1;
}
});

rs.checkpoint(Durations.seconds(60));
rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
suffix, Integer.class, Integer.class, TextOutputFormat.class);
jssc.start();
jssc.awaitTermination();
}
}

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5829) JavaStreamingContext.fileStream run task repeated empty when no more new files

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322321#comment-14322321
 ] 

Littlestar commented on SPARK-5829:
---

when I add new files into  /testspark/watchdir, it runs new task with good 
output.
when no new files add, it runs new task with empy rdd every 30 seconds.(I think 
there is some bugs, when no new files found)

 JavaStreamingContext.fileStream run task repeated empty when no more new files
 --

 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar

 spark master (1.3.0) with SPARK-5826 patch.
 JavaStreamingContext.fileStream run task repeated empty when no more new files
 reproduce:
   1. mkdir /testspark/watchdir on HDFS.
   2. run app.
   3. put some text files into /testspark/watchdir.
 every 30 seconds, spark log indicates that a new sub task runs.
 and /testspark/resultdir/ has new directory with empty files every 30 seconds.
 when no new files add, but it runs new task with empy rdd.
 {noformat}
 package my.test.hadoop.spark;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.api.java.function.Function2;
 import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.streaming.Durations;
 import org.apache.spark.streaming.api.java.JavaPairDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import scala.Tuple2;
 public class TestStream {
   @SuppressWarnings({ serial, resource })
   public static void main(String[] args) throws Exception {
   
   SparkConf conf = new SparkConf().setAppName(TestStream);
   JavaStreamingContext jssc = new JavaStreamingContext(conf, 
 Durations.seconds(30));
   jssc.checkpoint(/testspark/checkpointdir);
   Configuration jobConf = new Configuration();
   jobConf.set(my.test.fields,fields);
 JavaPairDStreamInteger, Integer is = 
 jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
 TextInputFormat.class, new FunctionPath, Boolean() {
 @Override
 public Boolean call(Path v1) throws Exception {
 return true;
 }
 }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
 Text, Integer, Integer() {
 @Override
 public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
 arg0) throws Exception {
 return new Tuple2Integer, Integer(1, 1);
 }
 });
   JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
 Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer arg0, Integer arg1) throws 
 Exception {
   return arg0 + arg1;
   }
   });
   rs.checkpoint(Durations.seconds(60));
   rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
 suffix, Integer.class, Integer.class, TextOutputFormat.class);
   jssc.start();
   jssc.awaitTermination();
   }
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5829:
--
Summary: JavaStreamingContext.fileStream run task loop repeated  empty when 
no more new files found  (was: JavaStreamingContext.fileStream run task 
repeated empty when no more new files)

 JavaStreamingContext.fileStream run task loop repeated  empty when no more 
 new files found
 --

 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar

 spark master (1.3.0) with SPARK-5826 patch.
 JavaStreamingContext.fileStream run task repeated empty when no more new files
 reproduce:
   1. mkdir /testspark/watchdir on HDFS.
   2. run app.
   3. put some text files into /testspark/watchdir.
 every 30 seconds, spark log indicates that a new sub task runs.
 and /testspark/resultdir/ has new directory with empty files every 30 seconds.
 when no new files add, but it runs new task with empy rdd.
 {noformat}
 package my.test.hadoop.spark;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.api.java.function.Function2;
 import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.streaming.Durations;
 import org.apache.spark.streaming.api.java.JavaPairDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import scala.Tuple2;
 public class TestStream {
   @SuppressWarnings({ serial, resource })
   public static void main(String[] args) throws Exception {
   
   SparkConf conf = new SparkConf().setAppName(TestStream);
   JavaStreamingContext jssc = new JavaStreamingContext(conf, 
 Durations.seconds(30));
   jssc.checkpoint(/testspark/checkpointdir);
   Configuration jobConf = new Configuration();
   jobConf.set(my.test.fields,fields);
 JavaPairDStreamInteger, Integer is = 
 jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
 TextInputFormat.class, new FunctionPath, Boolean() {
 @Override
 public Boolean call(Path v1) throws Exception {
 return true;
 }
 }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
 Text, Integer, Integer() {
 @Override
 public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
 arg0) throws Exception {
 return new Tuple2Integer, Integer(1, 1);
 }
 });
   JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
 Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer arg0, Integer arg1) throws 
 Exception {
   return arg0 + arg1;
   }
   });
   rs.checkpoint(Durations.seconds(60));
   rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
 suffix, Integer.class, Integer.class, TextOutputFormat.class);
   jssc.start();
   jssc.awaitTermination();
   }
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5829:
--
Priority: Minor  (was: Major)

 JavaStreamingContext.fileStream run task loop repeated  empty when no more 
 new files found
 --

 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar
Priority: Minor

 spark master (1.3.0) with SPARK-5826 patch.
 JavaStreamingContext.fileStream run task repeated empty when no more new files
 reproduce:
   1. mkdir /testspark/watchdir on HDFS.
   2. run app.
   3. put some text files into /testspark/watchdir.
 every 30 seconds, spark log indicates that a new sub task runs.
 and /testspark/resultdir/ has new directory with empty files every 30 seconds.
 when no new files add, but it runs new task with empy rdd.
 {noformat}
 package my.test.hadoop.spark;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.api.java.function.Function2;
 import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.streaming.Durations;
 import org.apache.spark.streaming.api.java.JavaPairDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import scala.Tuple2;
 public class TestStream {
   @SuppressWarnings({ serial, resource })
   public static void main(String[] args) throws Exception {
   
   SparkConf conf = new SparkConf().setAppName(TestStream);
   JavaStreamingContext jssc = new JavaStreamingContext(conf, 
 Durations.seconds(30));
   jssc.checkpoint(/testspark/checkpointdir);
   Configuration jobConf = new Configuration();
   jobConf.set(my.test.fields,fields);
 JavaPairDStreamInteger, Integer is = 
 jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
 TextInputFormat.class, new FunctionPath, Boolean() {
 @Override
 public Boolean call(Path v1) throws Exception {
 return true;
 }
 }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
 Text, Integer, Integer() {
 @Override
 public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
 arg0) throws Exception {
 return new Tuple2Integer, Integer(1, 1);
 }
 });
   JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
 Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer arg0, Integer arg1) throws 
 Exception {
   return arg0 + arg1;
   }
   });
   rs.checkpoint(Durations.seconds(60));
   rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
 suffix, Integer.class, Integer.class, TextOutputFormat.class);
   jssc.start();
   jssc.awaitTermination();
   }
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5796) Do not transform data on a last estimator in Pipeline

2015-02-15 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-5796.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4590
[https://github.com/apache/spark/pull/4590]

 Do not transform data on a last estimator in Pipeline
 -

 Key: SPARK-5796
 URL: https://issues.apache.org/jira/browse/SPARK-5796
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 1.3.0
Reporter: Peter Rudenko
Priority: Minor
 Fix For: 1.3.0


 If it's a last estimator in Pipeline there's no need to transform data, since 
 there's no next stage that would consume this data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5815) Deprecate SVDPlusPlus APIs that expose DoubleMatrix from JBLAS

2015-02-15 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-5815.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4614
[https://github.com/apache/spark/pull/4614]

 Deprecate SVDPlusPlus APIs that expose DoubleMatrix from JBLAS
 --

 Key: SPARK-5815
 URL: https://issues.apache.org/jira/browse/SPARK-5815
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.3.0
Reporter: Xiangrui Meng
Assignee: Sean Owen
 Fix For: 1.3.0


 It is generally bad to expose types defined in a 3rd-party package in Spark 
 public APIs. We should deprecate those methods in SVDPlusPlus and replace 
 them in the next release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5769) Set params in constructor and setParams() in Python ML pipeline API

2015-02-15 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-5769.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4564
[https://github.com/apache/spark/pull/4564]

 Set params in constructor and setParams() in Python ML pipeline API
 ---

 Key: SPARK-5769
 URL: https://issues.apache.org/jira/browse/SPARK-5769
 Project: Spark
  Issue Type: New Feature
  Components: ML, PySpark
Affects Versions: 1.3.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
 Fix For: 1.3.0


 As discussed in the design doc of SPARK-4586, we want to make Python users 
 happy (no setters/getters) while keeping a low maintenance cost by forcing 
 keyword arguments in the constructor and in setParams.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3866) Clean up python/run-tests problems

2015-02-15 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3866.
---
Resolution: Fixed

It looks like all of this issue's subtasks have been resolved, so I'm going to 
mark this as fixed.

 Clean up python/run-tests problems
 --

 Key: SPARK-3866
 URL: https://issues.apache.org/jira/browse/SPARK-3866
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
 Environment: Mac OS X 10.9.5, Python 2.7.8, Java 1.8.0_20
Reporter: Tomohiko K.
  Labels: pyspark, testing
 Attachments: unit-tests.log


 This issue is a overhaul issue to remove problems encountered when I run 
 ./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
 It will have sub-tasks for some kinds of issues.
 A test output is contained in the attached file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-15 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322344#comment-14322344
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi,


+1. I will upload the screenshot with these changes.

Thanks,
Twinkle

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
 Attachments: Screen Shot 2015-02-10 at 6.27.49 pm.png


 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322408#comment-14322408
 ] 

Littlestar edited comment on SPARK-5829 at 2/16/15 6:05 AM:


FileInputDStream.scala
{noformat}
  override def compute(validTime: Time): Option[RDD[(K, V)]] = {
// Find new files
val newFiles = findNewFiles(validTime.milliseconds)
logInfo(New files at time  + validTime + :\n + newFiles.mkString(\n))
batchTimeToSelectedFiles += ((validTime, newFiles))
recentlySelectedFiles ++= newFiles
+may there is check {newFiles.size  0}   can avoid this 
problem??+
Some(filesToRDD(newFiles))
  }
{noformat}
Thanks.


was (Author: cnstar9988):
FileInputDStream.scala
  override def compute(validTime: Time): Option[RDD[(K, V)]] = {
// Find new files
val newFiles = findNewFiles(validTime.milliseconds)
logInfo(New files at time  + validTime + :\n + newFiles.mkString(\n))
batchTimeToSelectedFiles += ((validTime, newFiles))
recentlySelectedFiles ++= newFiles
+may there is check {newFiles.size  0}   can avoid this 
problem??+
Some(filesToRDD(newFiles))
  }

Thanks.

 JavaStreamingContext.fileStream run task loop repeated  empty when no more 
 new files found
 --

 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar
Priority: Minor

 spark master (1.3.0) with SPARK-5826 patch.
 JavaStreamingContext.fileStream run task repeated empty when no more new files
 reproduce:
   1. mkdir /testspark/watchdir on HDFS.
   2. run app.
   3. put some text files into /testspark/watchdir.
 every 30 seconds, spark log indicates that a new sub task runs.
 and /testspark/resultdir/ has new directory with empty files every 30 seconds.
 when no new files add, but it runs new task with empy rdd.
 {noformat}
 package my.test.hadoop.spark;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.api.java.function.Function2;
 import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.streaming.Durations;
 import org.apache.spark.streaming.api.java.JavaPairDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import scala.Tuple2;
 public class TestStream {
   @SuppressWarnings({ serial, resource })
   public static void main(String[] args) throws Exception {
   
   SparkConf conf = new SparkConf().setAppName(TestStream);
   JavaStreamingContext jssc = new JavaStreamingContext(conf, 
 Durations.seconds(30));
   jssc.checkpoint(/testspark/checkpointdir);
   Configuration jobConf = new Configuration();
   jobConf.set(my.test.fields,fields);
 JavaPairDStreamInteger, Integer is = 
 jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
 TextInputFormat.class, new FunctionPath, Boolean() {
 @Override
 public Boolean call(Path v1) throws Exception {
 return true;
 }
 }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
 Text, Integer, Integer() {
 @Override
 public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
 arg0) throws Exception {
 return new Tuple2Integer, Integer(1, 1);
 }
 });
   JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
 Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer arg0, Integer arg1) throws 
 Exception {
   return arg0 + arg1;
   }
   });
   rs.checkpoint(Durations.seconds(60));
   rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
 suffix, Integer.class, Integer.class, TextOutputFormat.class);
   jssc.start();
   jssc.awaitTermination();
   }
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5830) Don't create unnecessary directory for local root dir

2015-02-15 Thread Weizhong (JIRA)
Weizhong created SPARK-5830:
---

 Summary: Don't create unnecessary directory for local root dir
 Key: SPARK-5830
 URL: https://issues.apache.org/jira/browse/SPARK-5830
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Weizhong
Priority: Minor


Now will create an unnecessary directory for local root directory, and this 
directory will not be deleted after application exit.
For example:
before will create tmp dir like /tmp/spark-UUID
now will create tmp dir like /tmp/spark-UUID/spark-UUID
so the dir /tmp/spark-UUID will not be deleted as a local root directory. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5830) Don't create unnecessary directory for local root dir

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322428#comment-14322428
 ] 

Apache Spark commented on SPARK-5830:
-

User 'Sephiroth-Lin' has created a pull request for this issue:
https://github.com/apache/spark/pull/4620

 Don't create unnecessary directory for local root dir
 -

 Key: SPARK-5830
 URL: https://issues.apache.org/jira/browse/SPARK-5830
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Weizhong
Priority: Minor

 Now will create an unnecessary directory for local root directory, and this 
 directory will not be deleted after application exit.
 For example:
 before will create tmp dir like /tmp/spark-UUID
 now will create tmp dir like /tmp/spark-UUID/spark-UUID
 so the dir /tmp/spark-UUID will not be deleted as a local root directory. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5831) When checkpoint file size is bigger than 10, then delete them

2015-02-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322433#comment-14322433
 ] 

Apache Spark commented on SPARK-5831:
-

User 'XuTingjun' has created a pull request for this issue:
https://github.com/apache/spark/pull/4621

 When checkpoint file size is bigger than 10, then delete them
 -

 Key: SPARK-5831
 URL: https://issues.apache.org/jira/browse/SPARK-5831
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: meiyoula
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5016) GaussianMixtureEM should distribute matrix inverse for large numFeatures, k

2015-02-15 Thread Manoj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322332#comment-14322332
 ] 

Manoj Kumar commented on SPARK-5016:


[~mengxr] Can you please clarify a few things.

1. How to key the BreezeData in order to effect parallelization across k 
gaussians. (considering the fact that it is a soft assignment)?
2. Even if we are able to do so, there are a few lines of code corresponding to 
the log-likelihood computation as pointed by [~tgaloppo] , which are 
interdependent, How can that be done?

 GaussianMixtureEM should distribute matrix inverse for large numFeatures, k
 ---

 Key: SPARK-5016
 URL: https://issues.apache.org/jira/browse/SPARK-5016
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley

 If numFeatures or k are large, GMM EM should distribute the matrix inverse 
 computation for Gaussian initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org