[jira] [Resolved] (SPARK-13182) Spark Executor retries infinitely

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13182.
---
   Resolution: Not A Problem
Fix Version/s: (was: 1.5.2)

I'm afraid that's an app problem. It's not clear to the framework why the JVM 
is dying or that it's not transient.

> Spark Executor retries infinitely
> -
>
> Key: SPARK-13182
> URL: https://issues.apache.org/jira/browse/SPARK-13182
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Prabhu Joseph
>Priority: Minor
>
>   When a Spark job (Spark-1.5.2) is submitted with a single executor and if 
> user passes some wrong JVM arguments with spark.executor.extraJavaOptions, 
> the first executor fails. But the job keeps on retrying, creating a new 
> executor and failing every time, until CTRL-C is pressed. 
> ./spark-submit --class SimpleApp --master "spark://10.10.72.145:7077"  --conf 
> "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=16" 
> /SPARK/SimpleApp.jar
> Here when user submits job with ConcGCThreads 16 which is greater than 
> ParallelGCThreads, JVM will crash. But the job does not exit, keeps on 
> creating executors and retrying.
> ..
> 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20160201065319-0014/2846 on hostPort 10.10.72.145:36558 with 12 cores, 
> 2.0 GB RAM
> 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: 
> app-20160201065319-0014/2846 is now LOADING
> 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: 
> app-20160201065319-0014/2846 is now RUNNING
> 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: 
> app-20160201065319-0014/2846 is now EXITED (Command exited with code 1)
> 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Executor 
> app-20160201065319-0014/2846 removed: Command exited with code 1
> 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Asked to remove 
> non-existent executor 2846
> 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor added: 
> app-20160201065319-0014/2847 on worker-20160131230345-10.10.72.145-36558 
> (10.10.72.145:36558) with 12 cores
> 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20160201065319-0014/2847 on hostPort 10.10.72.145:36558 with 12 cores, 
> 2.0 GB RAM
> 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: 
> app-20160201065319-0014/2847 is now LOADING
> 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: 
> app-20160201065319-0014/2847 is now EXITED (Command exited with code 1)
> 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Executor 
> app-20160201065319-0014/2847 removed: Command exited with code 1
> 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Asked to remove 
> non-existent executor 2847
> 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor added: 
> app-20160201065319-0014/2848 on worker-20160131230345-10.10.72.145-36558 
> (10.10.72.145:36558) with 12 cores
> 16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20160201065319-0014/2848 on hostPort 10.10.72.145:36558 with 12 cores, 
> 2.0 GB RAM
> 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: 
> app-20160201065319-0014/2848 is now LOADING
> 16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: 
> app-20160201065319-0014/2848 is now RUNNING
> Spark should not fall into a trap on these kind of user errors on a 
> production cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13197) When trying to select from the data frame which contains the columns with . in it, it is throwing exception.

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13197:
--
Fix Version/s: (was: 2.0.0)

> When trying to select from the data frame which contains the columns with . 
> in it, it is throwing exception.
> 
>
> Key: SPARK-13197
> URL: https://issues.apache.org/jira/browse/SPARK-13197
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Jayadevan M
>Priority: Minor
>
> When trying to select from the data frame which contains the columns with 
> (dot). in it, it is throwing the below exception.
> How to replicate:
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.select("a.c").collect() 
> stacktrace below
> scala> df.select("a.c").collect() 
> org.apache.spark.sql.AnalysisException: cannot resolve 'a.c' given input 
> columns: [a_b, a.c];
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:284)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:284)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:283)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:109)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:119)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:123)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:123)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:128)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:128)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:57)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:122)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:45)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:129)
>   at 
> org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$withPlan(DataFrame.scala:1803)
>   at org.apache.spark.sql.DataFrame.select(DataFrame.scala:704)
>   at org.apache.spark.sql.DataFrame.select(DataFrame.scala:721)
>   ... 49 elided



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SPARK-13197) When trying to select from the data frame which contains the columns with . in it, it is throwing exception.

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13197:
--
Affects Version/s: (was: 2.0.0)

> When trying to select from the data frame which contains the columns with . 
> in it, it is throwing exception.
> 
>
> Key: SPARK-13197
> URL: https://issues.apache.org/jira/browse/SPARK-13197
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Jayadevan M
>Priority: Minor
>
> When trying to select from the data frame which contains the columns with 
> (dot). in it, it is throwing the below exception.
> How to replicate:
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.select("a.c").collect() 
> stacktrace below
> scala> df.select("a.c").collect() 
> org.apache.spark.sql.AnalysisException: cannot resolve 'a.c' given input 
> columns: [a_b, a.c];
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:284)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:284)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:283)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:109)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:119)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:123)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:123)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:128)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:128)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:57)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:122)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:45)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:129)
>   at 
> org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$withPlan(DataFrame.scala:1803)
>   at org.apache.spark.sql.DataFrame.select(DataFrame.scala:704)
>   at org.apache.spark.sql.DataFrame.select(DataFrame.scala:721)
>   ... 49 elided



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13102:
--
   Labels:   (was: UI)
Fix Version/s: (was: 2.0.0)

> Run query using ThriftServer, and open web using IE11, i  click ”+detail" in 
> SQLPage, but not response
> --
>
> Key: SPARK-13102
> URL: https://issues.apache.org/jira/browse/SPARK-13102
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: KaiXinXIaoLei
> Attachments: dag info is blank.png, details in SQLPage.png
>
>
> I run query using ThriftServer, and open web using IE11. Then i  click 
> ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in 
> StagesPage, but get nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated

2016-02-07 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136173#comment-15136173
 ] 

holdenk commented on SPARK-13172:
-

Great! Let me know if you have any questions :)

> Stop using RichException.getStackTrace it is deprecated
> ---
>
> Key: SPARK-13172
> URL: https://issues.apache.org/jira/browse/SPARK-13172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> Throwable getStackTrace is the recommended alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12868) ADD JAR via sparkSQL JDBC will fail when using a HDFS URL

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12868:
--
Fix Version/s: (was: 1.6.1)

> ADD JAR via sparkSQL JDBC will fail when using a HDFS URL
> -
>
> Key: SPARK-12868
> URL: https://issues.apache.org/jira/browse/SPARK-12868
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Trystan Leftwich
>
> When trying to add a jar with a HDFS URI, i.E
> {code:sql}
> ADD JAR hdfs:///tmp/foo.jar
> {code}
> Via the spark sql JDBC interface it will fail with:
> {code:sql}
> java.net.MalformedURLException: unknown protocol: hdfs
> at java.net.URL.(URL.java:593)
> at java.net.URL.(URL.java:483)
> at java.net.URL.(URL.java:432)
> at java.net.URI.toURL(URI.java:1089)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper.addJar(ClientWrapper.scala:578)
> at org.apache.spark.sql.hive.HiveContext.addJar(HiveContext.scala:652)
> at org.apache.spark.sql.hive.execution.AddJar.run(commands.scala:89)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
> at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:211)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:151)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:164)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13072) simplify and improve murmur3 hash expression codegen

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13072:
--
Assignee: Wenchen Fan

> simplify and improve murmur3 hash expression codegen
> 
>
> Key: SPARK-13072
> URL: https://issues.apache.org/jira/browse/SPARK-13072
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13098) remove GenericInternalRowWithSchema

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13098:
--
Assignee: Wenchen Fan

> remove GenericInternalRowWithSchema
> ---
>
> Key: SPARK-13098
> URL: https://issues.apache.org/jira/browse/SPARK-13098
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13215) Remove fallback in codegen

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13215:
--
Assignee: Davies Liu

> Remove fallback in codegen
> --
>
> Key: SPARK-13215
> URL: https://issues.apache.org/jira/browse/SPARK-13215
> Project: Spark
>  Issue Type: Improvement
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> in newMutableProjection, it will fallback to InterpretedMutableProjection if 
> failed to compile.
> Since we remove the configuration for codegen, we are heavily reply on 
> codegen (also TungstenAggregate require the generated MutableProjection to 
> update UnsafeRow), should remove the fallback, which could make user 
> confusing, see the discussion in SPARK-13116.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12989) Bad interaction between StarExpansion and ExtractWindowExpressions

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12989:
--
Assignee: Xiao Li

> Bad interaction between StarExpansion and ExtractWindowExpressions
> --
>
> Key: SPARK-12989
> URL: https://issues.apache.org/jira/browse/SPARK-12989
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>Assignee: Xiao Li
> Fix For: 1.6.1, 2.0.0
>
>
> Reported initially here: 
> http://stackoverflow.com/questions/34995376/apache-spark-window-function-with-nested-column
> {code}
> import sqlContext.implicits._
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.expressions.Window
> sql("SET spark.sql.eagerAnalysis=false") // Let us see the error even though 
> we are constructing an invalid tree
> val data = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B", "C", 
> "num")
>   .withColumn("Data", struct("A", "B", "C"))
>   .drop("A")
>   .drop("B")
>   .drop("C")
> val winSpec = Window.partitionBy("Data.A", "Data.B").orderBy($"num".desc)
> data.select($"*", max("num").over(winSpec) as "max").explain(true)
> {code}
> When you run this, the analyzer inserts invalid columns into a projection, as 
> seen below:
> {code}
> == Parsed Logical Plan ==
> 'Project [*,'max('num) windowspecdefinition('Data.A,'Data.B,'num 
> DESC,UnspecifiedFrame) AS max#64928]
> +- Project [num#64926,Data#64927]
>+- Project [C#64925,num#64926,Data#64927]
>   +- Project [B#64924,C#64925,num#64926,Data#64927]
>  +- Project 
> [A#64923,B#64924,C#64925,num#64926,struct(A#64923,B#64924,C#64925) AS 
> Data#64927]
> +- Project [_1#64919 AS A#64923,_2#64920 AS B#64924,_3#64921 AS 
> C#64925,_4#64922 AS num#64926]
>+- LocalRelation [_1#64919,_2#64920,_3#64921,_4#64922], 
> [[a,b,c,3],[c,b,a,3]]
> == Analyzed Logical Plan ==
> num: int, Data: struct, max: int
> Project [num#64926,Data#64927,max#64928]
> +- Project [num#64926,Data#64927,A#64932,B#64933,max#64928,max#64928]
>+- Window [num#64926,Data#64927,A#64932,B#64933], 
> [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax(num#64926)
>  windowspecdefinition(A#64932,B#64933,num#64926 DESC,RANGE BETWEEN UNBOUNDED 
> PRECEDING AND CURRENT ROW) AS max#64928], [A#64932,B#64933], [num#64926 DESC]
>   +- !Project [num#64926,Data#64927,A#64932,B#64933]
>  +- Project [num#64926,Data#64927]
> +- Project [C#64925,num#64926,Data#64927]
>+- Project [B#64924,C#64925,num#64926,Data#64927]
>   +- Project 
> [A#64923,B#64924,C#64925,num#64926,struct(A#64923,B#64924,C#64925) AS 
> Data#64927]
>  +- Project [_1#64919 AS A#64923,_2#64920 AS 
> B#64924,_3#64921 AS C#64925,_4#64922 AS num#64926]
> +- LocalRelation 
> [_1#64919,_2#64920,_3#64921,_4#64922], [[a,b,c,3],[c,b,a,3]]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13073) creating R like summary for logistic Regression in Spark - Scala

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13073:
--
Target Version/s:   (was: 1.6.0)

> creating R like summary for logistic Regression in Spark - Scala
> 
>
> Key: SPARK-13073
> URL: https://issues.apache.org/jira/browse/SPARK-13073
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Reporter: Samsudhin
>Priority: Minor
>
> Currently Spark ML provides Coefficients for logistic regression. To evaluate 
> the trained model tests like wald test, chi square tests and their results to 
> be summarized and display like GLM summary of R



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12992) Vectorize parquet decoding using ColumnarBatch

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12992:
--
Fix Version/s: (was: 2.0.0)

> Vectorize parquet decoding using ColumnarBatch
> --
>
> Key: SPARK-12992
> URL: https://issues.apache.org/jira/browse/SPARK-12992
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Nong Li
>Assignee: Apache Spark
>
> Parquet files benefit from vectorized decoding. ColumnarBatches have been 
> designed to support this. This means that a single encoded parquet column is 
> decoded to a single ColumnVector. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13180) Protect against SessionState being null when accessing HiveClientImpl#conf

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13180:
--
Component/s: SQL

> Protect against SessionState being null when accessing HiveClientImpl#conf
> --
>
> Key: SPARK-13180
> URL: https://issues.apache.org/jira/browse/SPARK-13180
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-13180-util.patch
>
>
> See this thread http://search-hadoop.com/m/q3RTtFoTDi2HVCrM1
> {code}
> java.lang.NullPointerException
> at 
> org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205)
> at 
> org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552)
> at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551)
> at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:538)
> at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:537)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:537)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
> at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
> at org.apache.spark.sql.hive.HiveContext$$anon$2.(HiveContext.scala:457)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:457)
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:456)
> at org.apache.spark.sql.hive.HiveContext$$anon$3.(HiveContext.scala:473)
> at 
> org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:473)
> at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:472)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:133)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
> at 
> org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442)
> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:223)
> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13228) HiveSparkSubmitSuite is flaky

2016-02-07 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-13228:
-

 Summary: HiveSparkSubmitSuite is flaky
 Key: SPARK-13228
 URL: https://issues.apache.org/jira/browse/SPARK-13228
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Herman van Hovell


We are having some problems with 
{{org.apache.spark.sql.hive.HiveSparkSubmitSuite}}. The following tests are 
failing because of timeouts:
* SPARK-9757 Persist Parquet relation with decimal column
* SPARK-8020: set sql conf in spark conf

See the following builds for examples:
* 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50891/consoleFull
* 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50887/consoleFull
* 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50880/consoleFull
* 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50892/consoleFull



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13113) Remove unnecessary bit operation when decoding page number

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13113:
--
Assignee: Liang-Chi Hsieh

> Remove unnecessary bit operation when decoding page number
> --
>
> Key: SPARK-13113
> URL: https://issues.apache.org/jira/browse/SPARK-13113
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 2.0.0
>
>
> As we shift bits right when decoding page number, looks like the bitwise AND 
> operation is unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13067) DataFrameSuite.simple explode fail locally

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13067:
--
Assignee: Wenchen Fan

> DataFrameSuite.simple explode fail locally
> --
>
> Key: SPARK-13067
> URL: https://issues.apache.org/jira/browse/SPARK-13067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>
> This test only failed locally
> {code}
> [info] - simple explode *** FAILED *** (41 milliseconds)
> [info]   Failed to parse logical plan to JSON:
> [info]   Project [word#80]
> [info]   +- Generate UserDefinedGenerator(words#78), true, false, None, 
> [word#80]
> [info]  +- Project [_1#77 AS words#78]
> [info] +- LocalRelation [_1#77], [[a b c],[d e]] (QueryTest.scala:211)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:496)
> [info]   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
> [info]   at org.scalatest.Assertions$class.fail(Assertions.scala:1348)
> [info]   at org.scalatest.FunSuite.fail(FunSuite.scala:1555)
> [info]   at 
> org.apache.spark.sql.QueryTest.checkJsonFormat(QueryTest.scala:211)
> [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:132)
> [info]   at 
> org.apache.spark.sql.DataFrameSuite$$anonfun$11.apply$mcV$sp(DataFrameSuite.scala:136)
> [info]   at 
> org.apache.spark.sql.DataFrameSuite$$anonfun$11.apply(DataFrameSuite.scala:133)
> [info]   at 
> org.apache.spark.sql.DataFrameSuite$$anonfun$11.apply(DataFrameSuite.scala:133)
> [info]   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
> [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
> [info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
> [info]   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
> [info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
> [info]   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
> [info]   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
> [info]   at scala.collection.immutable.List.foreach(List.scala:318)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
> [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
> [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
> [info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
> [info]   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
> [info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
> [info]   at 
> org.apache.spark.sql.DataFrameSuite.org$scalatest$BeforeAndAfterAll$$super$run(DataFrameSuite.scala:36)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
> [info]   at org.apache.spark.sql.DataFrameSuite.run(DataFrameSuite.scala:36)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
> [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> [info]   at 
> 

[jira] [Updated] (SPARK-12926) SQLContext to display warning message when non-sql configs are being set

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12926:
--
Assignee: Tejas Patil

> SQLContext to display warning message when non-sql configs are being set
> 
>
> Key: SPARK-12926
> URL: https://issues.apache.org/jira/browse/SPARK-12926
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Tejas Patil
>Assignee: Tejas Patil
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Users unknowingly try to set core Spark configs in sqlContext but later 
> realise that it didn't work.
> {color:red}
> scala> sqlContext.sql("SET spark.shuffle.memoryFraction=0.4")
> res3: org.apache.spark.sql.DataFrame = [key: string, value: string]
> scala> sqlContext.getConf.get("spark.shuffle.memoryFraction")
> java.util.NoSuchElementException: spark.shuffle.memoryFraction
>   at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:193)
>   at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:193)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.SparkConf.get(SparkConf.scala:193)
> {color}
> We could do this:
> - for configs starting with "spark.", allow only if it starts with 
> "spark.sql.". Otherwise disallow.
> - allow anything else.
> This will be a simple change in SqlConf :
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala#L621



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12261) pyspark crash for large dataset

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12261:
--
Component/s: PySpark

[~joshrosen] do you have a lead on what else is at issue here? it did not look 
like a Spark issue to me. I see you reopened.

> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13064) api/v1/application/jobs/attempt lacks "attempId" field for spark-shell

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13064:
--
Component/s: Spark Shell

> api/v1/application/jobs/attempt lacks "attempId" field for spark-shell
> --
>
> Key: SPARK-13064
> URL: https://issues.apache.org/jira/browse/SPARK-13064
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Reporter: Zhuo Liu
>Priority: Minor
>
> For any application launches with spark-shell will not have attemptId field 
> in their rest API. From the REST API point of view, we might want to force an 
> Id for it, i.e., "1".
> {code}
> {
>   "id" : "application_1453789230389_377545",
>   "name" : "PySparkShell",
>   "attempts" : [ {
> "startTime" : "2016-01-28T02:17:11.035GMT",
> "endTime" : "2016-01-28T02:30:01.355GMT",
> "lastUpdated" : "2016-01-28T02:30:01.516GMT",
> "duration" : 770320,
> "sparkUser" : "huyng",
> "completed" : true
>   } ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13117) WebUI should use the local ip not 0.0.0.0

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13117:
--
Component/s: Web UI

> WebUI should use the local ip not 0.0.0.0
> -
>
> Key: SPARK-13117
> URL: https://issues.apache.org/jira/browse/SPARK-13117
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: Jeremiah Jordan
>
> When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP 
> except the WebUI.  The WebUI should use the SPARK_LOCAL_IP not always use 
> 0.0.0.0
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13083) Small spark sql queries get blocked if there is a long running query over a lot a partitions

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13083:
--
Component/s: SQL

> Small spark sql queries get blocked if there is a long running query over a 
> lot a partitions
> 
>
> Key: SPARK-13083
> URL: https://issues.apache.org/jira/browse/SPARK-13083
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Vishal Gupta
>Assignee: Michael Armbrust
>
> Steps to reproduce :
> a) Run first query doing count(*) over a lot of paritions ( ~4500 partitions 
> ) in s3.
> b) The spark-job for the first query starts running.
> c) Run second query "show tables"  to the same spark-application. ( i did it 
> using zeppellin ) 
> d) As soon as the second query "show tables" is submitted, it starts showing 
> up in the "Spark Application UI" > "SQL".
> e) At this point there is only one active job running in the application 
> which corresponds to the first query.
> f) Only after the job for the first query is near completion, the job for 
> "show tables" starts appearing in "Spark Application UI" > "Jobs". 
> g) As soon as the job for "show tables" starts, it completes very fast and 
> gives the results.
> Sometime step (c) has to performed after 1-2 minutes of execution of the 
> long-running-query. But after this point, jobs do not get started for any 
> number of smaller queries submitted to the spark-application till the 
> long-running-query is near execution. 
> They seem to be blocked on the long-running query. Ideally, they should have 
> started running as the all settings are for fair-scheduler.
> I am running spark-1.5.1. In addtion to it, I have the following configs :
> {code}
> spark.scheduler.mode FAIR
> spark.scheduler.allocation.file /usr/lib/spark/conf/fairscheduler.xml
> {code}
> /usr/lib/spark/conf/fairscheduler.xml has the following contents 
> {code}
> 
> 
>   
>   FAIR
>
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13104) Spark Metrics currently does not return executors hostname

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13104.
---
Resolution: Invalid

Questions should really go to u...@spark.apache.org

> Spark Metrics currently does not return executors hostname 
> ---
>
> Key: SPARK-13104
> URL: https://issues.apache.org/jira/browse/SPARK-13104
> Project: Spark
>  Issue Type: Question
>Reporter: Karthik
>Priority: Critical
>  Labels: executor, executorId, graphite, hostname, metrics
>
> We been using Spark Metrics and porting the data to InfluxDB using the 
> Graphite sink that is available in Spark. From what I can see, it only 
> provides he executorId and not the executor hostname. With each spark job, 
> the executorID changes. Is there any way to find the hostname based on the 
> executorID?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12591) NullPointerException using checkpointed mapWithState with KryoSerializer

2016-02-07 Thread Yuval Itzchakov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136208#comment-15136208
 ] 

Yuval Itzchakov commented on SPARK-12591:
-

I'm seeing this error as well running Spark 1.6.0 with KryoSerializer.

This error started happening after I reset a spark-worker node while a job is 
running. This hasn't happened previously and doesn't occur unless I restart a 
worker node.
[~zsxwing] Was this supposed to be fixed in 1.6.0? Or is there still a need to 
manually patch this until 1.6.1?

> NullPointerException using checkpointed mapWithState with KryoSerializer
> 
>
> Key: SPARK-12591
> URL: https://issues.apache.org/jira/browse/SPARK-12591
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: MacOSX
> Java(TM) SE Runtime Environment (build 1.8.0_20-ea-b17)
>Reporter: Jan Uyttenhove
>Assignee: Shixiong Zhu
> Fix For: 1.6.1, 2.0.0
>
> Attachments: Screen Shot 2016-01-27 at 10.09.18 AM.png
>
>
> Issue occured after upgrading to the RC4 of Spark (streaming) 1.6.0 to 
> (re)test the new mapWithState API, after previously reporting issue 
> SPARK-11932 (https://issues.apache.org/jira/browse/SPARK-11932). 
> For initial report, see 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-streaming-1-6-0-RC4-NullPointerException-using-mapWithState-tt15830.html
> Narrowed it down to an issue unrelated to Kafka directstream, but, after 
> observing very unpredictable behavior as a result of changes to the Kafka 
> messages format, it seems to be related to KryoSerialization in specific.
> For test case, see my modified version of the StatefulNetworkWordCount 
> example: https://gist.github.com/juyttenh/9b4a4103699a7d5f698f 
> To reproduce, use RC4 of Spark-1.6.0 and 
> - start nc:
> {code}nc -lk {code}
> - execute the supplied test case: 
> {code}bin/spark-submit --class 
> org.apache.spark.examples.streaming.StatefulNetworkWordCount --master 
> local[2] file:///some-assembly-jar localhost {code}
> Error scenario:
> - put some text in the nc console with the job running, and observe correct 
> functioning of the word count
> - kill the spark job
> - add some more text in the nc console (with the job not running)
> - restart the spark job and observe the NPE
> (you might need to repeat this a couple of times to trigger the exception)
> Here's the stacktrace: 
> {code}
> 15/12/31 11:43:47 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 5)
> java.lang.NullPointerException
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:103)
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111)
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:56)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:154)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Comment Edited] (SPARK-12591) NullPointerException using checkpointed mapWithState with KryoSerializer

2016-02-07 Thread Yuval Itzchakov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136208#comment-15136208
 ] 

Yuval Itzchakov edited comment on SPARK-12591 at 2/7/16 10:04 AM:
--

I'm seeing this error as well running Spark 1.6.0 with KryoSerializer.

This error started happening after I reset a spark-worker node while a job is 
running. This hasn't happened previously and doesn't occur unless I restart a 
worker node.
[~zsxwing] - I'm assuming this patch needs to be applied until 1.6.1, am I 
right?


was (Author: yuval.itzchakov):
I'm seeing this error as well running Spark 1.6.0 with KryoSerializer.

This error started happening after I reset a spark-worker node while a job is 
running. This hasn't happened previously and doesn't occur unless I restart a 
worker node.
[~zsxwing] - Was this supposed to be fixed in 1.6.0? Or is there still a need 
to manually patch this until 1.6.1?

> NullPointerException using checkpointed mapWithState with KryoSerializer
> 
>
> Key: SPARK-12591
> URL: https://issues.apache.org/jira/browse/SPARK-12591
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: MacOSX
> Java(TM) SE Runtime Environment (build 1.8.0_20-ea-b17)
>Reporter: Jan Uyttenhove
>Assignee: Shixiong Zhu
> Fix For: 1.6.1, 2.0.0
>
> Attachments: Screen Shot 2016-01-27 at 10.09.18 AM.png
>
>
> Issue occured after upgrading to the RC4 of Spark (streaming) 1.6.0 to 
> (re)test the new mapWithState API, after previously reporting issue 
> SPARK-11932 (https://issues.apache.org/jira/browse/SPARK-11932). 
> For initial report, see 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-streaming-1-6-0-RC4-NullPointerException-using-mapWithState-tt15830.html
> Narrowed it down to an issue unrelated to Kafka directstream, but, after 
> observing very unpredictable behavior as a result of changes to the Kafka 
> messages format, it seems to be related to KryoSerialization in specific.
> For test case, see my modified version of the StatefulNetworkWordCount 
> example: https://gist.github.com/juyttenh/9b4a4103699a7d5f698f 
> To reproduce, use RC4 of Spark-1.6.0 and 
> - start nc:
> {code}nc -lk {code}
> - execute the supplied test case: 
> {code}bin/spark-submit --class 
> org.apache.spark.examples.streaming.StatefulNetworkWordCount --master 
> local[2] file:///some-assembly-jar localhost {code}
> Error scenario:
> - put some text in the nc console with the job running, and observe correct 
> functioning of the word count
> - kill the spark job
> - add some more text in the nc console (with the job not running)
> - restart the spark job and observe the NPE
> (you might need to repeat this a couple of times to trigger the exception)
> Here's the stacktrace: 
> {code}
> 15/12/31 11:43:47 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 5)
> java.lang.NullPointerException
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:103)
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111)
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:56)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:154)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at 

[jira] [Resolved] (SPARK-13132) LogisticRegression spends 35% of its time fetching the standardization parameter

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13132.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11027
[https://github.com/apache/spark/pull/11027]

> LogisticRegression spends 35% of its time fetching the standardization 
> parameter
> 
>
> Key: SPARK-13132
> URL: https://issues.apache.org/jira/browse/SPARK-13132
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Gary King
> Fix For: 2.0.0
>
>
> when L1 regularization is used, the inner functor passed to the quasi-newton 
> optimizer in {{org.apache.spark.ml.classification.LogisticRegression#train}} 
> makes repeated calls to {{$(standardization)}}. because this ultimately 
> involves repeated string interpolation triggered by 
> {{org.apache.spark.ml.param.Param#hashCode}}, this line of code consumes 
> 35%-45% of the entire training time in my application.
> the range depends on whether the application sets an explicit value for the 
> standardization parameter or relies on the default value (which needs an 
> extra map lookup, resulting in an extra string interpolation, compared to the 
> explicitly set case)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13132) LogisticRegression spends 35% of its time fetching the standardization parameter

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13132:
--
Priority: Minor  (was: Major)

> LogisticRegression spends 35% of its time fetching the standardization 
> parameter
> 
>
> Key: SPARK-13132
> URL: https://issues.apache.org/jira/browse/SPARK-13132
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Gary King
>Assignee: Gary King
>Priority: Minor
> Fix For: 2.0.0
>
>
> when L1 regularization is used, the inner functor passed to the quasi-newton 
> optimizer in {{org.apache.spark.ml.classification.LogisticRegression#train}} 
> makes repeated calls to {{$(standardization)}}. because this ultimately 
> involves repeated string interpolation triggered by 
> {{org.apache.spark.ml.param.Param#hashCode}}, this line of code consumes 
> 35%-45% of the entire training time in my application.
> the range depends on whether the application sets an explicit value for the 
> standardization parameter or relies on the default value (which needs an 
> extra map lookup, resulting in an extra string interpolation, compared to the 
> explicitly set case)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13132) LogisticRegression spends 35% of its time fetching the standardization parameter

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13132:
--
Assignee: Gary King

> LogisticRegression spends 35% of its time fetching the standardization 
> parameter
> 
>
> Key: SPARK-13132
> URL: https://issues.apache.org/jira/browse/SPARK-13132
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Gary King
>Assignee: Gary King
> Fix For: 2.0.0
>
>
> when L1 regularization is used, the inner functor passed to the quasi-newton 
> optimizer in {{org.apache.spark.ml.classification.LogisticRegression#train}} 
> makes repeated calls to {{$(standardization)}}. because this ultimately 
> involves repeated string interpolation triggered by 
> {{org.apache.spark.ml.param.Param#hashCode}}, this line of code consumes 
> 35%-45% of the entire training time in my application.
> the range depends on whether the application sets an explicit value for the 
> standardization parameter or relies on the default value (which needs an 
> extra map lookup, resulting in an extra string interpolation, compared to the 
> explicitly set case)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13200) Investigate math.round on integer number in MFDataGenerator.scala:109

2016-02-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13200:


Assignee: Apache Spark

> Investigate math.round on integer number in MFDataGenerator.scala:109
> -
>
> Key: SPARK-13200
> URL: https://issues.apache.org/jira/browse/SPARK-13200
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Minor
>
> Apparently we are calling round on an integer which now in Scala 2.11 results 
> in a warning (it didn't make any sense before either). Figure out if this is 
> a mistake we can just remove or if we got the types wrong somewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13200) Investigate math.round on integer number in MFDataGenerator.scala:109

2016-02-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13200:


Assignee: (was: Apache Spark)

> Investigate math.round on integer number in MFDataGenerator.scala:109
> -
>
> Key: SPARK-13200
> URL: https://issues.apache.org/jira/browse/SPARK-13200
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: holdenk
>Priority: Minor
>
> Apparently we are calling round on an integer which now in Scala 2.11 results 
> in a warning (it didn't make any sense before either). Figure out if this is 
> a mistake we can just remove or if we got the types wrong somewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13200) Investigate math.round on integer number in MFDataGenerator.scala:109

2016-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136177#comment-15136177
 ] 

Apache Spark commented on SPARK-13200:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/0

> Investigate math.round on integer number in MFDataGenerator.scala:109
> -
>
> Key: SPARK-13200
> URL: https://issues.apache.org/jira/browse/SPARK-13200
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: holdenk
>Priority: Minor
>
> Apparently we are calling round on an integer which now in Scala 2.11 results 
> in a warning (it didn't make any sense before either). Figure out if this is 
> a mistake we can just remove or if we got the types wrong somewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5159.
--
   Resolution: Cannot Reproduce
Fix Version/s: (was: 1.5.2)

> Thrift server does not respect hive.server2.enable.doAs=true
> 
>
> Key: SPARK-5159
> URL: https://issues.apache.org/jira/browse/SPARK-5159
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andrew Ray
> Attachments: spark_thrift_server_log.txt
>
>
> I'm currently testing the spark sql thrift server on a kerberos secured 
> cluster in YARN mode. Currently any user can access any table regardless of 
> HDFS permissions as all data is read as the hive user. In HiveServer2 the 
> property hive.server2.enable.doAs=true causes all access to be done as the 
> submitting user. We should do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13140) spark sql aggregate performance decrease

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13140.
---
Resolution: Invalid

Since there's no real way to understand enough from this description to make an 
actionable JIRA, and you have a question at the moment, can you start at 
u...@spark.apache.org? if you have a more specific reproduction and/or solution 
you can open a JIRA.

> spark sql  aggregate performance decrease  
> ---
>
> Key: SPARK-13140
> URL: https://issues.apache.org/jira/browse/SPARK-13140
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: spencerlee
>   Original Estimate: 10h
>  Remaining Estimate: 10h
>
> In our scenario, their are 30 + key columns with 60+ metric columns.
> our typical query is: select key1, key2, key3, key4, key5, sum(metric1), 
> sum(metric2), sum(metric3) sum(metric30) from table_name group by key1, 
> key2, key3, key4, key5.
> I import a single parquet file(60M, about 250w+ records) into sparksql , and 
> do the typical query with local mode.  I found that, when I only aggregate 24 
> metrics, the response time is about 4.81s, when I aggregate 25+ metrics, the 
> response time is 45.9s, which is almost 10 times slower. that's obviously 
> unreasonable. 
> Is this a bug or need modify some configuration to tune the query?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13213) BroadcastNestedLoopJoin is very slow

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13213:
--
Component/s: SQL

[~davies] again you'd be doing us a favor if you would set component on your 
JIRAs. I notice yours almost never do.

> BroadcastNestedLoopJoin is very slow
> 
>
> Key: SPARK-13213
> URL: https://issues.apache.org/jira/browse/SPARK-13213
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>
> Since we have improve the performance of CartisianProduct, which should be 
> faster and robuster than BroacastNestedLoopJoin, we should do 
> CartisianProduct instead of BroacastNestedLoopJoin, especially  when the 
> broadcasted table is not that small.
> Today, we hit a query that take very long time but still not finished, once 
> decrease the threshold for broadcast (disable BroacastNestedLoopJoin), it 
> just finished in seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13173) Fail to load CSV file with NPE

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13173:
--
Component/s: SQL
 Input/Output

> Fail to load CSV file with NPE
> --
>
> Key: SPARK-13173
> URL: https://issues.apache.org/jira/browse/SPARK-13173
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output, SQL
>Reporter: Davies Liu
>
> {code}
> id|end_date|start_date|location
> 1|2015-10-14 00:00:00|2015-09-14 00:00:00|CA-SF
> 2|2015-10-15 01:00:20|2015-08-14 00:00:00|CA-SD
> 3|2015-10-16 02:30:00|2015-01-14 00:00:00|NY-NY
> 4|2015-10-17 03:00:20|2015-02-14 00:00:00|NY-NY
> 5|2015-10-18 04:30:00|2014-04-14 00:00:00|CA-SD
> {code}
> {code}
> adult_df = sqlContext.read.\
> format("org.apache.spark.sql.execution.datasources.csv").\
> option("header", "false").option("delimiter", "|").\
> option("inferSchema", "true").load("/tmp/dataframe_sample.csv")
> {code}
> {code}
> Py4JJavaError: An error occurred while calling o239.load.
> : java.lang.NullPointerException
>   at 
> scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
>   at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114)
>   at 
> scala.collection.IndexedSeqOptimized$class.zipWithIndex(IndexedSeqOptimized.scala:93)
>   at 
> scala.collection.mutable.ArrayOps$ofRef.zipWithIndex(ArrayOps.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation.inferSchema(CSVRelation.scala:137)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation.dataSchema$lzycompute(CSVRelation.scala:50)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation.dataSchema(CSVRelation.scala:48)
>   at 
> org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:666)
>   at 
> org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:665)
>   at 
> org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:39)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:115)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:136)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
>   at py4j.Gateway.invoke(Gateway.java:290)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:209)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13174) Add API and options for csv data sources

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13174:
--
Component/s: Input/Output

> Add API and options for csv data sources
> 
>
> Key: SPARK-13174
> URL: https://issues.apache.org/jira/browse/SPARK-13174
> Project: Spark
>  Issue Type: New Feature
>  Components: Input/Output
>Reporter: Davies Liu
>
> We should have a API to load csv data source (with some options as 
> arguments), similar to json() and jdbc()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12591) NullPointerException using checkpointed mapWithState with KryoSerializer

2016-02-07 Thread Yuval Itzchakov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136208#comment-15136208
 ] 

Yuval Itzchakov edited comment on SPARK-12591 at 2/7/16 10:03 AM:
--

I'm seeing this error as well running Spark 1.6.0 with KryoSerializer.

This error started happening after I reset a spark-worker node while a job is 
running. This hasn't happened previously and doesn't occur unless I restart a 
worker node.
[~zsxwing] - Was this supposed to be fixed in 1.6.0? Or is there still a need 
to manually patch this until 1.6.1?


was (Author: yuval.itzchakov):
I'm seeing this error as well running Spark 1.6.0 with KryoSerializer.

This error started happening after I reset a spark-worker node while a job is 
running. This hasn't happened previously and doesn't occur unless I restart a 
worker node.
[~zsxwing] Was this supposed to be fixed in 1.6.0? Or is there still a need to 
manually patch this until 1.6.1?

> NullPointerException using checkpointed mapWithState with KryoSerializer
> 
>
> Key: SPARK-12591
> URL: https://issues.apache.org/jira/browse/SPARK-12591
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: MacOSX
> Java(TM) SE Runtime Environment (build 1.8.0_20-ea-b17)
>Reporter: Jan Uyttenhove
>Assignee: Shixiong Zhu
> Fix For: 1.6.1, 2.0.0
>
> Attachments: Screen Shot 2016-01-27 at 10.09.18 AM.png
>
>
> Issue occured after upgrading to the RC4 of Spark (streaming) 1.6.0 to 
> (re)test the new mapWithState API, after previously reporting issue 
> SPARK-11932 (https://issues.apache.org/jira/browse/SPARK-11932). 
> For initial report, see 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-streaming-1-6-0-RC4-NullPointerException-using-mapWithState-tt15830.html
> Narrowed it down to an issue unrelated to Kafka directstream, but, after 
> observing very unpredictable behavior as a result of changes to the Kafka 
> messages format, it seems to be related to KryoSerialization in specific.
> For test case, see my modified version of the StatefulNetworkWordCount 
> example: https://gist.github.com/juyttenh/9b4a4103699a7d5f698f 
> To reproduce, use RC4 of Spark-1.6.0 and 
> - start nc:
> {code}nc -lk {code}
> - execute the supplied test case: 
> {code}bin/spark-submit --class 
> org.apache.spark.examples.streaming.StatefulNetworkWordCount --master 
> local[2] file:///some-assembly-jar localhost {code}
> Error scenario:
> - put some text in the nc console with the job running, and observe correct 
> functioning of the word count
> - kill the spark job
> - add some more text in the nc console (with the job not running)
> - restart the spark job and observe the NPE
> (you might need to repeat this a couple of times to trigger the exception)
> Here's the stacktrace: 
> {code}
> 15/12/31 11:43:47 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 5)
> java.lang.NullPointerException
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:103)
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111)
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:56)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:154)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at 

[jira] [Reopened] (SPARK-12423) Mesos executor home should not be resolved on the driver's file system

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-12423:
---

> Mesos executor home should not be resolved on the driver's file system
> --
>
> Key: SPARK-12423
> URL: https://issues.apache.org/jira/browse/SPARK-12423
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Iulian Dragos
>
> {{spark.mesos.executor.home}} should be an uninterpreted string. It is very 
> possible that this path does not exist on the driver, and if it does, it may 
> be a symlink that should not be resolved. Currently, this leads to failures 
> in client mode.
> For example, setting it to {{/var/spark/spark-1.6.0-bin-hadoop2.6/}} leads to 
> executors failing:
> {code}
> sh: 1: /private/var/spark/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found
> {code}
> {{getCanonicalPath}} transforms {{/var/spark...}} into {{/private/var..}} 
> because on my system there is a symlink from one to the other.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5331) Spark workers can't find tachyon master as spark-ec2 doesn't set spark.tachyonStore.url

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5331:
-
Assignee: Shivaram Venkataraman

> Spark workers can't find tachyon master as spark-ec2 doesn't set 
> spark.tachyonStore.url
> ---
>
> Key: SPARK-5331
> URL: https://issues.apache.org/jira/browse/SPARK-5331
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
> Environment: Running on EC2 via modified spark-ec2 scripts (to get 
> dependencies right so tachyon starts)
> Using tachyon 0.5.0 built against hadoop 2.4.1
> Spark 1.2.0 built against tachyon 0.5.0 and hadoop 0.4.1
> Tachyon configured using the template in 0.5.0 but updated with slave list 
> and master variables etc..
>Reporter: Florian Verhein
>Assignee: Shivaram Venkataraman
> Fix For: 1.4.0
>
>
> ps -ef | grep Tachyon 
> shows Tachyon running on the master (and the slave) node with correct setting:
> -Dtachyon.master.hostname=ec2-54-252-156-187.ap-southeast-2.compute.amazonaws.com
> However from stderr log on worker running the SparkTachyonPi example:
> 15/01/20 06:00:56 INFO CacheManager: Partition rdd_0_0 not found, computing it
> 15/01/20 06:00:56 INFO : Trying to connect master @ localhost/127.0.0.1:19998
> 15/01/20 06:00:56 ERROR : Failed to connect (1) to master 
> localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused
> 15/01/20 06:00:57 ERROR : Failed to connect (2) to master 
> localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused
> 15/01/20 06:00:58 ERROR : Failed to connect (3) to master 
> localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused
> 15/01/20 06:00:59 ERROR : Failed to connect (4) to master 
> localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused
> 15/01/20 06:01:00 ERROR : Failed to connect (5) to master 
> localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused
> 15/01/20 06:01:01 WARN TachyonBlockManager: Attempt 1 to create tachyon dir 
> null failed
> java.io.IOException: Failed to connect to master localhost/127.0.0.1:19998 
> after 5 attempts
>   at tachyon.client.TachyonFS.connect(TachyonFS.java:293)
>   at tachyon.client.TachyonFS.getFileId(TachyonFS.java:1011)
>   at tachyon.client.TachyonFS.exist(TachyonFS.java:633)
>   at 
> org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:117)
>   at 
> org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:106)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>   at 
> org.apache.spark.storage.TachyonBlockManager.createTachyonDirs(TachyonBlockManager.scala:106)
>   at 
> org.apache.spark.storage.TachyonBlockManager.(TachyonBlockManager.scala:57)
>   at 
> org.apache.spark.storage.BlockManager.tachyonStore$lzycompute(BlockManager.scala:94)
>   at 
> org.apache.spark.storage.BlockManager.tachyonStore(BlockManager.scala:88)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:773)
>   at 
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
>   at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:145)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
>   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: tachyon.org.apache.thrift.TException: Failed to connect to master 
> localhost/127.0.0.1:19998 after 5 attempts
>   at tachyon.master.MasterClient.connect(MasterClient.java:178)
>   at 

[jira] [Resolved] (SPARK-12423) Mesos executor home should not be resolved on the driver's file system

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12423.
---
   Resolution: Duplicate
Fix Version/s: (was: 2.0.0)

> Mesos executor home should not be resolved on the driver's file system
> --
>
> Key: SPARK-12423
> URL: https://issues.apache.org/jira/browse/SPARK-12423
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Iulian Dragos
>
> {{spark.mesos.executor.home}} should be an uninterpreted string. It is very 
> possible that this path does not exist on the driver, and if it does, it may 
> be a symlink that should not be resolved. Currently, this leads to failures 
> in client mode.
> For example, setting it to {{/var/spark/spark-1.6.0-bin-hadoop2.6/}} leads to 
> executors failing:
> {code}
> sh: 1: /private/var/spark/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found
> {code}
> {{getCanonicalPath}} transforms {{/var/spark...}} into {{/private/var..}} 
> because on my system there is a symlink from one to the other.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-5159:
--

> Thrift server does not respect hive.server2.enable.doAs=true
> 
>
> Key: SPARK-5159
> URL: https://issues.apache.org/jira/browse/SPARK-5159
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andrew Ray
> Attachments: spark_thrift_server_log.txt
>
>
> I'm currently testing the spark sql thrift server on a kerberos secured 
> cluster in YARN mode. Currently any user can access any table regardless of 
> HDFS permissions as all data is read as the hive user. In HiveServer2 the 
> property hive.server2.enable.doAs=true causes all access to be done as the 
> submitting user. We should do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9975) Add Normalized Closeness Centrality to Spark GraphX

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9975.
--
Resolution: Won't Fix

I don't think anyone is merging GraphX changes at this stage. If it's an 
improvement only I don't think it will go in.

> Add Normalized Closeness Centrality to Spark GraphX
> ---
>
> Key: SPARK-9975
> URL: https://issues.apache.org/jira/browse/SPARK-9975
> Project: Spark
>  Issue Type: New Feature
>  Components: GraphX
>Reporter: Kenny Bastani
>Priority: Minor
>  Labels: features
>
> “Closeness centrality” is also defined as a proportion. First, the distance 
> of a vertex from all other vertices in the network is counted. Normalization 
> is achieved by defining closeness centrality as the number of other vertices 
> divided by this sum (De Nooy et al., 2005, p. 127). Because of this 
> normalization, closeness centrality provides a global measure about the 
> position of a vertex in the network, while betweenness centrality is defined 
> with reference to the local position of a vertex. -- Cited from 
> http://arxiv.org/pdf/0911.2719.pdf
> This request is to add normalized closeness centrality as a core graph 
> algorithm in the GraphX library. I implemented this algorithm for a graph 
> processing extension to Neo4j 
> (https://github.com/kbastani/neo4j-mazerunner#supported-algorithms) and I 
> would like to put it up for review for inclusion into Spark. This algorithm 
> is very straight forward and builds on top of the included ShortestPaths 
> (SSSP) algorithm already in the library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13129) Spark SQL can't query hive table, which is create by Hive HCatalog Streaming API

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13129.
---
Resolution: Not A Problem

> Spark SQL can't query hive table, which is create by Hive HCatalog Streaming 
> API 
> -
>
> Key: SPARK-13129
> URL: https://issues.apache.org/jira/browse/SPARK-13129
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: hadoop version: 2.5.0-cdh5.3.2
> hive version: 0.13.1
> spark version: 1.6.0
>Reporter: Tao Li
>  Labels: hive, orc, sparksql
>
> I create a Hive table using Hive HCatalog Streaming API.
> https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
> The hive table is streaming data ingested by flume hive sink. And I can query 
> the hive table using hive command line.
> But I can't query the hive table using spark-sql command line. Is it spark 
> sql's bug or a unimplemented feature?
> The hive storage file is ORC format with ACID support.
> http://orc.apache.org/docs/acid.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13156) JDBC using multiple partitions creates additional tasks but only executes on one

2016-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136195#comment-15136195
 ] 

Sean Owen commented on SPARK-13156:
---

What do you mean when you say only 1 is querying? is there anything about your 
app or the driver that prevents concurrent connections?
I suspect this isn't a Spark issue per se. It's launching its 5 tasks.

> JDBC using multiple partitions creates additional tasks but only executes on 
> one
> 
>
> Key: SPARK-13156
> URL: https://issues.apache.org/jira/browse/SPARK-13156
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.5.0
> Environment: Hadoop 2.6.0-cdh5.4.0, Teradata, yarn-client
>Reporter: Charles Drotar
>
> I can successfully kick off a query through JDBC to Teradata, and when it 
> runs it creates a task on each executor for every partition. The problem is 
> that all of the tasks except for one complete within a couple seconds and the 
> final task handles the entire dataset.
> Example Code:
> private val properties = new java.util.Properties()
> properties.setProperty("driver","com.teradata.jdbc.TeraDriver")
> properties.setProperty("username","foo")
> properties.setProperty("password","bar")
> val url = "jdbc:teradata://oneview/, TMODE=TERA,TYPE=FASTEXPORT,SESSIONS=10"
> val numPartitions = 5
> val dbTableTemp = "( SELECT  id MOD $numPartitions%d AS modulo, id FROM 
> db.table) AS TEMP_TABLE"
> val partitionColumn = "modulo"
> val lowerBound = 0.toLong
> val upperBound = (numPartitions-1).toLong
> val df = 
> sqlContext.read.jdbc(url,dbTableTemp,partitionColumn,lowerBound,upperBound,numPartitions,properties)
> df.write.parquet("/output/path/for/df/")
> When I look at the Spark UI I see the 5 tasks, but only 1 is actually 
> querying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13120) Shade protobuf-java

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13120.
---
Resolution: Won't Fix

At the moment I don't see an argument for shading this, although I suspect it 
wouldn't hurt. 

> Shade protobuf-java
> ---
>
> Key: SPARK-13120
> URL: https://issues.apache.org/jira/browse/SPARK-13120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>
> See this thread for background information:
> http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis
> This issue shades com.google.protobuf:protobuf-java as 
> org.spark-project.protobuf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2016-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136236#comment-15136236
 ] 

Sean Owen commented on SPARK-13127:
---

[~JustinPihony] I suspect this is a good idea, but whenever someone suggests a 
dependency upgrade, the question is of course: are there incompatible changes? 
is it compatible with other dependencies? does it work with all transitive 
dependencies?

Would you mind opening a PR with the change, which will entail running the 
dependency update scripts to check and declare the changed transitive 
dependencies? and then also review release notes to identify any breaking 
changes we should know about? for 2.0.0 we can tolerate most incompatibilities 
but good to know.

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Justin Pihony
>Priority: Minor
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10963) Make KafkaCluster api public

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10963:
--
Assignee: Cody Koeninger

> Make KafkaCluster api public
> 
>
> Key: SPARK-10963
> URL: https://issues.apache.org/jira/browse/SPARK-10963
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Cody Koeninger
>Assignee: Cody Koeninger
>Priority: Minor
> Fix For: 2.0.0
>
>
> per mailing list discussion, theres enough interest in people using 
> KafkaCluster (e.g. to access latest offsets) to justify making it public



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13211) StreamingContext throws NoSuchElementException when created from non-existent checkpoint directory

2016-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136265#comment-15136265
 ] 

Sean Owen commented on SPARK-13211:
---

Yes, I'm agreeing that there is a code change. You do not need it assigned.

> StreamingContext throws NoSuchElementException when created from non-existent 
> checkpoint directory
> --
>
> Key: SPARK-13211
> URL: https://issues.apache.org/jira/browse/SPARK-13211
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {code}
> scala> new StreamingContext("_checkpoint")
> 16/02/05 08:51:10 INFO Checkpoint: Checkpoint directory _checkpoint does not 
> exist
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:108)
>   at 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:114)
>   ... 43 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9975) Add Normalized Closeness Centrality to Spark GraphX

2016-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136249#comment-15136249
 ] 

Sean Owen commented on SPARK-9975:
--

Yeah, that's why I replied. I have seen a number of PRs for GraphX not be 
reviewed for months, so I'm not sure what's going on there. Unless it's a clear 
and important fix I don't know that others would review it. I don't know if 
there are more formal conclusions to draw, but I'm trying to save you time.

> Add Normalized Closeness Centrality to Spark GraphX
> ---
>
> Key: SPARK-9975
> URL: https://issues.apache.org/jira/browse/SPARK-9975
> Project: Spark
>  Issue Type: New Feature
>  Components: GraphX
>Reporter: Kenny Bastani
>Priority: Minor
>  Labels: features
>
> “Closeness centrality” is also defined as a proportion. First, the distance 
> of a vertex from all other vertices in the network is counted. Normalization 
> is achieved by defining closeness centrality as the number of other vertices 
> divided by this sum (De Nooy et al., 2005, p. 127). Because of this 
> normalization, closeness centrality provides a global measure about the 
> position of a vertex in the network, while betweenness centrality is defined 
> with reference to the local position of a vertex. -- Cited from 
> http://arxiv.org/pdf/0911.2719.pdf
> This request is to add normalized closeness centrality as a core graph 
> algorithm in the GraphX library. I implemented this algorithm for a graph 
> processing extension to Neo4j 
> (https://github.com/kbastani/neo4j-mazerunner#supported-algorithms) and I 
> would like to put it up for review for inclusion into Spark. This algorithm 
> is very straight forward and builds on top of the included ShortestPaths 
> (SSSP) algorithm already in the library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13226) MLLib PowerIteration Clustering depends on deprecated KMeans setRuns API

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13226.
---
Resolution: Duplicate

> MLLib PowerIteration Clustering depends on deprecated KMeans setRuns API
> 
>
> Key: SPARK-13226
> URL: https://issues.apache.org/jira/browse/SPARK-13226
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: holdenk
>Priority: Trivial
>
> The current MLLib PowerIteration clustering implementation sets the number of 
> parallel runs inside of the kmeans call to 5. This deprecated.
> The reference implementation also appears to either sex max iterations or a 
> tolerance, both of which are currently left to our kmeans defaults ( 
> http://www.cs.cmu.edu/~wcohen/ )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13170) Investigate replacing SynchronizedQueue as it is deprecated

2016-02-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13170:


Assignee: Apache Spark

> Investigate replacing SynchronizedQueue as it is deprecated
> ---
>
> Key: SPARK-13170
> URL: https://issues.apache.org/jira/browse/SPARK-13170
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming, Tests
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Trivial
>
> In some of our tests we use SynchronizedQueue to append to the queue after 
> creating a queue stream. SynchronizedQueue is deprecated and we should see if 
> we can replace it. This is a bit tricky since the queue stream API is public, 
> and while it doesn't depend on having a SynchronizedQueue as input 
> (thankfully) it does require a Queue. We could possibly change the tests to 
> not depend on the SynchronizedQueue or change the QueueStream to also work 
> with Iterables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13170) Investigate replacing SynchronizedQueue as it is deprecated

2016-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136281#comment-15136281
 ] 

Apache Spark commented on SPARK-13170:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/1

> Investigate replacing SynchronizedQueue as it is deprecated
> ---
>
> Key: SPARK-13170
> URL: https://issues.apache.org/jira/browse/SPARK-13170
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming, Tests
>Reporter: holdenk
>Priority: Trivial
>
> In some of our tests we use SynchronizedQueue to append to the queue after 
> creating a queue stream. SynchronizedQueue is deprecated and we should see if 
> we can replace it. This is a bit tricky since the queue stream API is public, 
> and while it doesn't depend on having a SynchronizedQueue as input 
> (thankfully) it does require a Queue. We could possibly change the tests to 
> not depend on the SynchronizedQueue or change the QueueStream to also work 
> with Iterables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13170) Investigate replacing SynchronizedQueue as it is deprecated

2016-02-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13170:


Assignee: (was: Apache Spark)

> Investigate replacing SynchronizedQueue as it is deprecated
> ---
>
> Key: SPARK-13170
> URL: https://issues.apache.org/jira/browse/SPARK-13170
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming, Tests
>Reporter: holdenk
>Priority: Trivial
>
> In some of our tests we use SynchronizedQueue to append to the queue after 
> creating a queue stream. SynchronizedQueue is deprecated and we should see if 
> we can replace it. This is a bit tricky since the queue stream API is public, 
> and while it doesn't depend on having a SynchronizedQueue as input 
> (thankfully) it does require a Queue. We could possibly change the tests to 
> not depend on the SynchronizedQueue or change the QueueStream to also work 
> with Iterables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11472) SparkContext creation error after sc.stop() when Spark is compiled for Hive

2016-02-07 Thread Atkins (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136127#comment-15136127
 ] 

Atkins edited comment on SPARK-11472 at 2/7/16 4:19 PM:


Reproduced in kerberized hadoop cluster with Spark 1.6.0 compiled with Hive
It seems that the creation of the new SparkContext use old token of stopped 
SparkContext  to connect to hive metastore with 
{noformat}AuthenticationMethod.TOKEN{noformat} instead of 
{noformat}AuthenticationMethod.KERBEROS{noformat}

If you need to create few SparkContext in same jvm one after another, here is 
some *not recommended* workaround which not recommended:
{code}
import org.apache.hadoop.io.Text
import org.apache.spark.SparkContext
import scala.reflect.runtime._

val namespace: String = ??? //... your hdfs ha namespace
oldSc.stop

val mirror = 
universe.runtimeMirror(Option(Thread.currentThread().getContextClassLoader).getOrElse(getClass.getClassLoader))
val cUserGroupInformation = 
mirror.classLoader.loadClass("org.apache.hadoop.security.UserGroupInformation")
val mGetCredentialsInternal = {
  val  _mGetCredentialsInternal = 
cUserGroupInformation.getDeclaredMethod("getCredentialsInternal")
  _mGetCredentialsInternal.setAccessible(true)
  _mGetCredentialsInternal
}
val mGetCurrentUser = cUserGroupInformation.getDeclaredMethod("getCurrentUser")
val cCredentials = 
mirror.classLoader.loadClass("org.apache.hadoop.security.Credentials")
val fTokenMap = {
  val _fTokenMap = cCredentials.getDeclaredField("tokenMap")
  _fTokenMap.setAccessible(true)
  _fTokenMap
}
val user = mGetCurrentUser.invoke(null)
user.synchronized {
  val credential = mGetCredentialsInternal.invoke(user)
  val tokenMap = fTokenMap.get(credential).asInstanceOf[java.util.Map[_, _]]
  tokenMap.remove(new Text("hive.server2.delegation.token"))
  tokenMap.remove(new Text(s"ha-hdfs:$namespace"))
}

val newSc = new SparkContext
{code}


was (Author: atkins):
Reproduced in kerberized hadoop cluster with Spark 1.6.0 compiled with Hive
It seems that the creation of the new SparkContext use old token of stopped 
SparkContext  to connect to hive metastore with 
{noformat}AuthenticationMethod.TOKEN{noformat} instead of 
{noformat}AuthenticationMethod.KERBEROS{noformat}

If you need to create few SparkContext in same jvm one after another, here is 
some *not recommended* workaround which not recommended:
{code}
import org.apache.hadoop.io.Text
import org.apache.spark.SparkContext
import scala.reflect.runtime._

oldSc.stop

val mirror = 
universe.runtimeMirror(Option(Thread.currentThread().getContextClassLoader).getOrElse(getClass.getClassLoader))
val cUserGroupInformation = 
mirror.classLoader.loadClass("org.apache.hadoop.security.UserGroupInformation")
val mGetCredentialsInternal = {
  val  _mGetCredentialsInternal = 
cUserGroupInformation.getDeclaredMethod("getCredentialsInternal")
  _mGetCredentialsInternal.setAccessible(true)
  _mGetCredentialsInternal
}
val mGetCurrentUser = cUserGroupInformation.getDeclaredMethod("getCurrentUser")
val cCredentials = 
mirror.classLoader.loadClass("org.apache.hadoop.security.Credentials")
val fTokenMap = {
  val _fTokenMap = cCredentials.getDeclaredField("tokenMap")
  _fTokenMap.setAccessible(true)
  _fTokenMap
}
val user = mGetCurrentUser.invoke(null)
user.synchronized {
  val credential = mGetCredentialsInternal.invoke(user)
  val tokenMap = fTokenMap.get(credential).asInstanceOf[java.util.Map[_, _]]
  tokenMap.remove(new Text("hive.server2.delegation.token"))
  tokenMap.remove(new Text("ha-hdfs:pubgame"))
}

val newSc = new SparkContext
{code}

> SparkContext creation error after sc.stop() when Spark is compiled for Hive
> ---
>
> Key: SPARK-11472
> URL: https://issues.apache.org/jira/browse/SPARK-11472
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.5.1
> Environment: Red Hat ES 6.7 x86_64
> Spark 1.5.1, Scala 2.10.4, Java 1.7.0_85, Hive 1.2.1
> Authentication done through Kerberos
>Reporter: Pierre Beauvois
>
> Spark 1.5.1 has been compiled with the following command :
> {noformat}
> mvn -Pyarn -Phive -Phive-thriftserver -PsparkR -DskipTests -X clean package
> {noformat}
> After its installation, the file "hive-site.xml" has been added in the conf 
> directory (this is not an hard copy, it's a symbolic link). 
> When the spark-shell is started, the SparkContext and the sqlContext are 
> properly created. Nevertheless, when I stop the SparkContext and then try to 
> create a new one, an error appears. The output of this error is the following:
> {code:title=SparkContextCreationError.scala|borderStyle=solid}
> // imports
> scala> import org.apache.spark.SparkConf
> import org.apache.spark.SparkConf
> scala> import org.apache.spark.SparkContext
> import 

[jira] [Commented] (SPARK-13211) StreamingContext throws NoSuchElementException when created from non-existent checkpoint directory

2016-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136255#comment-15136255
 ] 

Sean Owen commented on SPARK-13211:
---

Yeah I think this code path which returns {{None}} needs to match the behavior 
found at the end of the method, where it causes an exception. The behavior here 
is correct, but the exception isn't great.

{code}
val checkpointFiles = Checkpoint.getCheckpointFiles(checkpointDir, 
Some(fs)).reverse
if (checkpointFiles.isEmpty) {
  return None
}
{code}

> StreamingContext throws NoSuchElementException when created from non-existent 
> checkpoint directory
> --
>
> Key: SPARK-13211
> URL: https://issues.apache.org/jira/browse/SPARK-13211
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {code}
> scala> new StreamingContext("_checkpoint")
> 16/02/05 08:51:10 INFO Checkpoint: Checkpoint directory _checkpoint does not 
> exist
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:108)
>   at 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:114)
>   ... 43 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9975) Add Normalized Closeness Centrality to Spark GraphX

2016-02-07 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136220#comment-15136220
 ] 

Stavros Kontopoulos commented on SPARK-9975:


What do you mean no-one is merging? Is this an abandoned library or frozen, let 
me know exactly so not to waste any effort... This is a new feature btw here 
and i was planning to fix it, its not an improvement.Betweenness could be also 
a feature. Can we contribute in this area or not please let me know?

> Add Normalized Closeness Centrality to Spark GraphX
> ---
>
> Key: SPARK-9975
> URL: https://issues.apache.org/jira/browse/SPARK-9975
> Project: Spark
>  Issue Type: New Feature
>  Components: GraphX
>Reporter: Kenny Bastani
>Priority: Minor
>  Labels: features
>
> “Closeness centrality” is also defined as a proportion. First, the distance 
> of a vertex from all other vertices in the network is counted. Normalization 
> is achieved by defining closeness centrality as the number of other vertices 
> divided by this sum (De Nooy et al., 2005, p. 127). Because of this 
> normalization, closeness centrality provides a global measure about the 
> position of a vertex in the network, while betweenness centrality is defined 
> with reference to the local position of a vertex. -- Cited from 
> http://arxiv.org/pdf/0911.2719.pdf
> This request is to add normalized closeness centrality as a core graph 
> algorithm in the GraphX library. I implemented this algorithm for a graph 
> processing extension to Neo4j 
> (https://github.com/kbastani/neo4j-mazerunner#supported-algorithms) and I 
> would like to put it up for review for inclusion into Spark. This algorithm 
> is very straight forward and builds on top of the included ShortestPaths 
> (SSSP) algorithm already in the library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9975) Add Normalized Closeness Centrality to Spark GraphX

2016-02-07 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136220#comment-15136220
 ] 

Stavros Kontopoulos edited comment on SPARK-9975 at 2/7/16 10:56 AM:
-

What do you mean no-one is merging? Is this an abandoned library or frozen, let 
me know exactly so not to waste any effort... This is a new feature btw here 
and i was planning to fix it, its not an improvement.Betweenness could be also 
a feature. Can we contribute to this area or not please let me know...


was (Author: skonto):
What do you mean no-one is merging? Is this an abandoned library or frozen, let 
me know exactly so not to waste any effort... This is a new feature btw here 
and i was planning to fix it, its not an improvement.Betweenness could be also 
a feature. Can we contribute in this area or not please let me know?

> Add Normalized Closeness Centrality to Spark GraphX
> ---
>
> Key: SPARK-9975
> URL: https://issues.apache.org/jira/browse/SPARK-9975
> Project: Spark
>  Issue Type: New Feature
>  Components: GraphX
>Reporter: Kenny Bastani
>Priority: Minor
>  Labels: features
>
> “Closeness centrality” is also defined as a proportion. First, the distance 
> of a vertex from all other vertices in the network is counted. Normalization 
> is achieved by defining closeness centrality as the number of other vertices 
> divided by this sum (De Nooy et al., 2005, p. 127). Because of this 
> normalization, closeness centrality provides a global measure about the 
> position of a vertex in the network, while betweenness centrality is defined 
> with reference to the local position of a vertex. -- Cited from 
> http://arxiv.org/pdf/0911.2719.pdf
> This request is to add normalized closeness centrality as a core graph 
> algorithm in the GraphX library. I implemented this algorithm for a graph 
> processing extension to Neo4j 
> (https://github.com/kbastani/neo4j-mazerunner#supported-algorithms) and I 
> would like to put it up for review for inclusion into Spark. This algorithm 
> is very straight forward and builds on top of the included ShortestPaths 
> (SSSP) algorithm already in the library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10963) Make KafkaCluster api public

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-10963.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 9007
[https://github.com/apache/spark/pull/9007]

> Make KafkaCluster api public
> 
>
> Key: SPARK-10963
> URL: https://issues.apache.org/jira/browse/SPARK-10963
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Cody Koeninger
>Priority: Minor
> Fix For: 2.0.0
>
>
> per mailing list discussion, theres enough interest in people using 
> KafkaCluster (e.g. to access latest offsets) to justify making it public



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13170) Investigate replacing SynchronizedQueue as it is deprecated

2016-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136273#comment-15136273
 ] 

Sean Owen commented on SPARK-13170:
---

Yeah, tough one. The problem with the API is that it assumes the implementation 
it is given is thread-safe, since it's something that is inherently modified by 
the caller and Spark, but does not demand a SynchronizedQueue. Despite the 
recommendation to use ConcurrentLinkedQueue, there's no conversion (?) to/from 
Scala Queue.

I think the simplest thing is to change the implementation and usages to use a 
Queue but synchronize access, and document this. At least, I'll open a PR to 
that effect for discussion.

> Investigate replacing SynchronizedQueue as it is deprecated
> ---
>
> Key: SPARK-13170
> URL: https://issues.apache.org/jira/browse/SPARK-13170
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming, Tests
>Reporter: holdenk
>Priority: Trivial
>
> In some of our tests we use SynchronizedQueue to append to the queue after 
> creating a queue stream. SynchronizedQueue is deprecated and we should see if 
> we can replace it. This is a bit tricky since the queue stream API is public, 
> and while it doesn't depend on having a SynchronizedQueue as input 
> (thankfully) it does require a Queue. We could possibly change the tests to 
> not depend on the SynchronizedQueue or change the QueueStream to also work 
> with Iterables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13229) When checkpoint interval for ConstantInputDStream is lower than batch duration IllegalArgumentException says it is due to slide time instead

2016-02-07 Thread Jacek Laskowski (JIRA)
Jacek Laskowski created SPARK-13229:
---

 Summary: When checkpoint interval for ConstantInputDStream is 
lower than batch duration IllegalArgumentException says it is due to slide time 
instead
 Key: SPARK-13229
 URL: https://issues.apache.org/jira/browse/SPARK-13229
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 2.0.0
Reporter: Jacek Laskowski


I have not set the slide time so the requirement failure is not meaningful at 
all to me (and the end user is left confused):

{code}
java.lang.IllegalArgumentException: requirement failed: The checkpoint interval 
for ConstantInputDStream has been set to 1000 ms which is lower than its slide 
time (5000 ms). Please set it to at least 5000 ms.
{code}

Here is the code to reproduce:

{code}
val sc = new SparkContext("local[*]", "Constant Input DStream Demo", new 
SparkConf())
val ssc = new StreamingContext(sc, batchDuration = Seconds(5))
ssc.checkpoint("_checkpoint")
import org.apache.spark.streaming.dstream.ConstantInputDStream
val rdd = sc.parallelize(0 to 9)
val cis = new ConstantInputDStream(ssc, rdd)
cis.checkpoint(interval = Seconds(1))
cis.print
ssc.start
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13229) When checkpoint interval for ConstantInputDStream is lower than batch duration IllegalArgumentException says it is due to slide time instead

2016-02-07 Thread Jacek Laskowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Laskowski updated SPARK-13229:

  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)

> When checkpoint interval for ConstantInputDStream is lower than batch 
> duration IllegalArgumentException says it is due to slide time instead
> 
>
> Key: SPARK-13229
> URL: https://issues.apache.org/jira/browse/SPARK-13229
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> I have not set the slide time so the requirement failure is not meaningful at 
> all to me (and the end user is left confused):
> {code}
> java.lang.IllegalArgumentException: requirement failed: The checkpoint 
> interval for ConstantInputDStream has been set to 1000 ms which is lower than 
> its slide time (5000 ms). Please set it to at least 5000 ms.
> {code}
> Here is the code to reproduce:
> {code}
> val sc = new SparkContext("local[*]", "Constant Input DStream Demo", new 
> SparkConf())
> val ssc = new StreamingContext(sc, batchDuration = Seconds(5))
> ssc.checkpoint("_checkpoint")
> import org.apache.spark.streaming.dstream.ConstantInputDStream
> val rdd = sc.parallelize(0 to 9)
> val cis = new ConstantInputDStream(ssc, rdd)
> cis.checkpoint(interval = Seconds(1))
> cis.print
> ssc.start
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13190) Update pom.xml to reference Scala 2.11

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13190.
---
Resolution: Duplicate

> Update pom.xml to reference Scala 2.11
> --
>
> Key: SPARK-13190
> URL: https://issues.apache.org/jira/browse/SPARK-13190
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Luciano Resende
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13193) Update Docker tests to use Scala 2.11

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13193.
---
Resolution: Duplicate

> Update Docker tests to use Scala 2.11
> -
>
> Key: SPARK-13193
> URL: https://issues.apache.org/jira/browse/SPARK-13193
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Luciano Resende
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13191) Update LICENSE with Scala 2.11 dependencies

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13191.
---
Resolution: Duplicate

> Update LICENSE with Scala 2.11 dependencies
> ---
>
> Key: SPARK-13191
> URL: https://issues.apache.org/jira/browse/SPARK-13191
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Luciano Resende
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13194) Update release audit tools to use Scala 2.11

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13194.
---
Resolution: Duplicate

> Update release audit tools to use Scala 2.11
> 
>
> Key: SPARK-13194
> URL: https://issues.apache.org/jira/browse/SPARK-13194
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Luciano Resende
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12591) NullPointerException using checkpointed mapWithState with KryoSerializer

2016-02-07 Thread Yuval Itzchakov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136208#comment-15136208
 ] 

Yuval Itzchakov edited comment on SPARK-12591 at 2/7/16 10:47 AM:
--

I'm seeing this error as well running Spark 1.6.0 with KryoSerializer.

This error started happening after I reset a spark-worker node while a job is 
running. This hasn't happened previously and doesn't occur unless I restart a 
worker node.
[~zsxwing] - I'm assuming this patch needs to be applied until 1.6.1, am I 
right?

I've tried registering the kryo class via `SparkConf.registerKryoClasses`, and 
I'm still seeing this exception:

val sparkConf = new SparkConf().setAppName(SparkVariables.AppName)
  .setMaster(masterUri)
  .setJars(jars.toArray[String](new Array[String](jars.size(
  .set("spark.serializer", serializer)
  .set("spark.kryoserializer.buffer.max", serializerMaxBuffer)
  .registerKryoClasses(Array(classOf[WordProcessor], classOf[Message], 
classOf[JavaSerializer]))




was (Author: yuval.itzchakov):
I'm seeing this error as well running Spark 1.6.0 with KryoSerializer.

This error started happening after I reset a spark-worker node while a job is 
running. This hasn't happened previously and doesn't occur unless I restart a 
worker node.
[~zsxwing] - I'm assuming this patch needs to be applied until 1.6.1, am I 
right?

> NullPointerException using checkpointed mapWithState with KryoSerializer
> 
>
> Key: SPARK-12591
> URL: https://issues.apache.org/jira/browse/SPARK-12591
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: MacOSX
> Java(TM) SE Runtime Environment (build 1.8.0_20-ea-b17)
>Reporter: Jan Uyttenhove
>Assignee: Shixiong Zhu
> Fix For: 1.6.1, 2.0.0
>
> Attachments: Screen Shot 2016-01-27 at 10.09.18 AM.png
>
>
> Issue occured after upgrading to the RC4 of Spark (streaming) 1.6.0 to 
> (re)test the new mapWithState API, after previously reporting issue 
> SPARK-11932 (https://issues.apache.org/jira/browse/SPARK-11932). 
> For initial report, see 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-streaming-1-6-0-RC4-NullPointerException-using-mapWithState-tt15830.html
> Narrowed it down to an issue unrelated to Kafka directstream, but, after 
> observing very unpredictable behavior as a result of changes to the Kafka 
> messages format, it seems to be related to KryoSerialization in specific.
> For test case, see my modified version of the StatefulNetworkWordCount 
> example: https://gist.github.com/juyttenh/9b4a4103699a7d5f698f 
> To reproduce, use RC4 of Spark-1.6.0 and 
> - start nc:
> {code}nc -lk {code}
> - execute the supplied test case: 
> {code}bin/spark-submit --class 
> org.apache.spark.examples.streaming.StatefulNetworkWordCount --master 
> local[2] file:///some-assembly-jar localhost {code}
> Error scenario:
> - put some text in the nc console with the job running, and observe correct 
> functioning of the word count
> - kill the spark job
> - add some more text in the nc console (with the job not running)
> - restart the spark job and observe the NPE
> (you might need to repeat this a couple of times to trigger the exception)
> Here's the stacktrace: 
> {code}
> 15/12/31 11:43:47 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 5)
> java.lang.NullPointerException
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:103)
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111)
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:56)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:154)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at 

[jira] [Commented] (SPARK-13211) StreamingContext throws NoSuchElementException when created from non-existent checkpoint directory

2016-02-07 Thread Jacek Laskowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136262#comment-15136262
 ] 

Jacek Laskowski commented on SPARK-13211:
-

Can I read it as if you'd approve a pull request for it? :) If so, please 
assign it to me. Thanks.

> StreamingContext throws NoSuchElementException when created from non-existent 
> checkpoint directory
> --
>
> Key: SPARK-13211
> URL: https://issues.apache.org/jira/browse/SPARK-13211
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {code}
> scala> new StreamingContext("_checkpoint")
> 16/02/05 08:51:10 INFO Checkpoint: Checkpoint directory _checkpoint does not 
> exist
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:108)
>   at 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:114)
>   ... 43 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13211) StreamingContext throws NoSuchElementException when created from non-existent checkpoint directory

2016-02-07 Thread Jacek Laskowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136266#comment-15136266
 ] 

Jacek Laskowski commented on SPARK-13211:
-

Thanks! I'm on it.

> StreamingContext throws NoSuchElementException when created from non-existent 
> checkpoint directory
> --
>
> Key: SPARK-13211
> URL: https://issues.apache.org/jira/browse/SPARK-13211
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {code}
> scala> new StreamingContext("_checkpoint")
> 16/02/05 08:51:10 INFO Checkpoint: Checkpoint directory _checkpoint does not 
> exist
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:108)
>   at 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:114)
>   ... 43 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-02-07 Thread Christopher Bourez (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136229#comment-15136229
 ] 

Christopher Bourez commented on SPARK-12261:


I'm still here if you need any more info about how to reproduce the case

> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13189) Cleanup build references to Scala 2.10

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13189:
--
Assignee: Luciano Resende
Priority: Minor  (was: Major)

> Cleanup build references to Scala 2.10
> --
>
> Key: SPARK-13189
> URL: https://issues.apache.org/jira/browse/SPARK-13189
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Luciano Resende
>Assignee: Luciano Resende
>Priority: Minor
>
> There are still few places referencing scala 2.10/2.10.5 while it should be 
> 2.11/2.11.7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13229) When checkpoint interval for ConstantInputDStream is lower than batch duration IllegalArgumentException says it is due to slide time instead

2016-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13229:
--
Priority: Trivial  (was: Minor)

If you're just quibbling with "has been set to", then I agree this can be 
removed. It merely has a certain value, whether because it was set or it was 
the default. The check and error are otherwise correct.

> When checkpoint interval for ConstantInputDStream is lower than batch 
> duration IllegalArgumentException says it is due to slide time instead
> 
>
> Key: SPARK-13229
> URL: https://issues.apache.org/jira/browse/SPARK-13229
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>
> I have not set the slide time so the requirement failure is not meaningful at 
> all to me (and the end user is left confused):
> {code}
> java.lang.IllegalArgumentException: requirement failed: The checkpoint 
> interval for ConstantInputDStream has been set to 1000 ms which is lower than 
> its slide time (5000 ms). Please set it to at least 5000 ms.
> {code}
> Here is the code to reproduce:
> {code}
> val sc = new SparkContext("local[*]", "Constant Input DStream Demo", new 
> SparkConf())
> val ssc = new StreamingContext(sc, batchDuration = Seconds(5))
> ssc.checkpoint("_checkpoint")
> import org.apache.spark.streaming.dstream.ConstantInputDStream
> val rdd = sc.parallelize(0 to 9)
> val cis = new ConstantInputDStream(ssc, rdd)
> cis.checkpoint(interval = Seconds(1))
> cis.print
> ssc.start
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9763) Minimize exposure of internal SQL classes

2016-02-07 Thread Junyang Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136480#comment-15136480
 ] 

Junyang Shen commented on SPARK-9763:
-

May I ask why the StructType is always defined as "asNullable" in the following 
three places?

Line 109:  StructType(schema.filterNot(f => 
partitionColumns.contains(f.name))).asNullable 
Line 149:  StructType(partitionColumns.map { col =>
  schema.find(_.name == col).getOrElse {
throw new RuntimeException(s"Partition column $col not found in schema 
$schema")
  }
}).asNullable
Line 186: Some(dataSchema.asNullable) 

The "asNullable()" function would override the "StructField.nullable". Is it 
because the inside SQL data has to be nullable? Thanks!

> Minimize exposure of internal SQL classes
> -
>
> Key: SPARK-9763
> URL: https://issues.apache.org/jira/browse/SPARK-9763
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9975) Add Normalized Closeness Centrality to Spark GraphX

2016-02-07 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136339#comment-15136339
 ] 

Stavros Kontopoulos commented on SPARK-9975:


Thank you for answering back, however i would say that there are three 
committers in this area correct me if i am wrong, can we bring this up for 
discussion? For example if there is no time for PR review we could assist with 
that. I would appreciate to talk about this further since i am sure i am not 
the only one interested out there, at least to know more about what is 
happening. 

> Add Normalized Closeness Centrality to Spark GraphX
> ---
>
> Key: SPARK-9975
> URL: https://issues.apache.org/jira/browse/SPARK-9975
> Project: Spark
>  Issue Type: New Feature
>  Components: GraphX
>Reporter: Kenny Bastani
>Priority: Minor
>  Labels: features
>
> “Closeness centrality” is also defined as a proportion. First, the distance 
> of a vertex from all other vertices in the network is counted. Normalization 
> is achieved by defining closeness centrality as the number of other vertices 
> divided by this sum (De Nooy et al., 2005, p. 127). Because of this 
> normalization, closeness centrality provides a global measure about the 
> position of a vertex in the network, while betweenness centrality is defined 
> with reference to the local position of a vertex. -- Cited from 
> http://arxiv.org/pdf/0911.2719.pdf
> This request is to add normalized closeness centrality as a core graph 
> algorithm in the GraphX library. I implemented this algorithm for a graph 
> processing extension to Neo4j 
> (https://github.com/kbastani/neo4j-mazerunner#supported-algorithms) and I 
> would like to put it up for review for inclusion into Spark. This algorithm 
> is very straight forward and builds on top of the included ShortestPaths 
> (SSSP) algorithm already in the library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13120) Shade protobuf-java

2016-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136320#comment-15136320
 ] 

Ted Yu commented on SPARK-13120:


I do see the advantage of shading protobuf-java which would make user 
experience better where wire-compatibility across different protobuf versions 
is provided.

> Shade protobuf-java
> ---
>
> Key: SPARK-13120
> URL: https://issues.apache.org/jira/browse/SPARK-13120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>
> See this thread for background information:
> http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis
> This issue shades com.google.protobuf:protobuf-java as 
> org.spark-project.protobuf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13120) Shade protobuf-java

2016-02-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-13120:
---
Description: 
https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8
PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so 
dependency will be a problem.
Shading protobuf-java would provide better experience for downstream projects.

This issue shades com.google.protobuf:protobuf-java as 
org.spark-project.protobuf

  was:
See this thread for background information:

http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis

This issue shades com.google.protobuf:protobuf-java as 
org.spark-project.protobuf


> Shade protobuf-java
> ---
>
> Key: SPARK-13120
> URL: https://issues.apache.org/jira/browse/SPARK-13120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>
> https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8
> PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so 
> dependency will be a problem.
> Shading protobuf-java would provide better experience for downstream projects.
> This issue shades com.google.protobuf:protobuf-java as 
> org.spark-project.protobuf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13120) Shade protobuf-java

2016-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136383#comment-15136383
 ] 

Ted Yu commented on SPARK-13120:


I don't think this JIRA is limited to the scenario from the thread initially 
cited.

I have dropped that from description.

> Shade protobuf-java
> ---
>
> Key: SPARK-13120
> URL: https://issues.apache.org/jira/browse/SPARK-13120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>
> https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8
> PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so 
> dependency will be a problem.
> Shading protobuf-java would provide better experience for downstream projects.
> This issue shades com.google.protobuf:protobuf-java as 
> org.spark-project.protobuf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13120) Shade protobuf-java

2016-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136360#comment-15136360
 ] 

Sean Owen commented on SPARK-13120:
---

What problem does that solve then? shading really helps when the API has 
changed across versions. It doesn't fix what third-party libs might provide at 
runtime though. I like shading in general as a defensive mechanism, but would 
like to do so only when we have some plausible scenario it fixes. The thread at 
the top doesn't show a protobuf problem of this form.

> Shade protobuf-java
> ---
>
> Key: SPARK-13120
> URL: https://issues.apache.org/jira/browse/SPARK-13120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>
> See this thread for background information:
> http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis
> This issue shades com.google.protobuf:protobuf-java as 
> org.spark-project.protobuf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13026) Umbrella: Allow user to specify initial model when training

2016-02-07 Thread Xusen Yin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xusen Yin updated SPARK-13026:
--
Description: 
Umbrella JIRA for allowing the user to set the initial model when training.

Move [design 
doc|https://docs.google.com/document/d/1LSRQDXOepVsOsCRT_PFwuiS9qmbgCzEskVPKXdqHoX0/edit?usp=sharing]
 here from the duplicated JIRA.

  was:Umbrella JIRA for allowing the user to set the initial model when 
training.


> Umbrella: Allow user to specify initial model when training
> ---
>
> Key: SPARK-13026
> URL: https://issues.apache.org/jira/browse/SPARK-13026
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: holdenk
>
> Umbrella JIRA for allowing the user to set the initial model when training.
> Move [design 
> doc|https://docs.google.com/document/d/1LSRQDXOepVsOsCRT_PFwuiS9qmbgCzEskVPKXdqHoX0/edit?usp=sharing]
>  here from the duplicated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13026) Umbrella: Allow user to specify initial model when training

2016-02-07 Thread Xusen Yin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xusen Yin updated SPARK-13026:
--
Description: 
Umbrella JIRA for allowing the user to set the initial model when training.

Move the [design 
doc|https://docs.google.com/document/d/1LSRQDXOepVsOsCRT_PFwuiS9qmbgCzEskVPKXdqHoX0/edit?usp=sharing]
 here from the duplicated JIRA.

  was:
Umbrella JIRA for allowing the user to set the initial model when training.

Move [design 
doc|https://docs.google.com/document/d/1LSRQDXOepVsOsCRT_PFwuiS9qmbgCzEskVPKXdqHoX0/edit?usp=sharing]
 here from the duplicated JIRA.


> Umbrella: Allow user to specify initial model when training
> ---
>
> Key: SPARK-13026
> URL: https://issues.apache.org/jira/browse/SPARK-13026
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: holdenk
>
> Umbrella JIRA for allowing the user to set the initial model when training.
> Move the [design 
> doc|https://docs.google.com/document/d/1LSRQDXOepVsOsCRT_PFwuiS9qmbgCzEskVPKXdqHoX0/edit?usp=sharing]
>  here from the duplicated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-02-07 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136515#comment-15136515
 ] 

Josh Rosen commented on SPARK-12261:


[~srowen], I reopened this after it was updated with reproduction instructions 
because it seemed like it might be legitimate and would be worth investigating. 
However, I don't have any spare cycles to investigate this myself.

> pyspark crash for large dataset
> ---
>
> Key: SPARK-12261
> URL: https://issues.apache.org/jira/browse/SPARK-12261
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
> Environment: windows
>Reporter: zihao
>
> I tried to import a local text(over 100mb) file via textFile in pyspark, when 
> i ran data.take(), it failed and gave error messages including:
> 15/12/10 17:17:43 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> Traceback (most recent call last):
>   File "E:/spark_python/test3.py", line 9, in 
> lines.take(5)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\rdd.py", line 1299, 
> in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\context.py", line 
> 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File "C:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 813, in 
> __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "D:\spark\spark-1.5.2-bin-hadoop2.6\python\pyspark\sql\utils.py", line 
> 36, in deco
> return f(*a, **kw)
>   File "C:\Anaconda2\lib\site-packages\py4j\protocol.py", line 308, in 
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.net.SocketException: Connection reset by peer: 
> socket write error
> Then i ran the same code for a small text file, this time .take() worked fine.
> How can i solve this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12591) NullPointerException using checkpointed mapWithState with KryoSerializer

2016-02-07 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136519#comment-15136519
 ] 

Shixiong Zhu commented on SPARK-12591:
--

Yes. You need to apply this patch for 1.6.0 by yourself. For a workaround, see 
my comment here: 
https://issues.apache.org/jira/browse/SPARK-12591?focusedCommentId=15084412=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15084412

> NullPointerException using checkpointed mapWithState with KryoSerializer
> 
>
> Key: SPARK-12591
> URL: https://issues.apache.org/jira/browse/SPARK-12591
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: MacOSX
> Java(TM) SE Runtime Environment (build 1.8.0_20-ea-b17)
>Reporter: Jan Uyttenhove
>Assignee: Shixiong Zhu
> Fix For: 1.6.1, 2.0.0
>
> Attachments: Screen Shot 2016-01-27 at 10.09.18 AM.png
>
>
> Issue occured after upgrading to the RC4 of Spark (streaming) 1.6.0 to 
> (re)test the new mapWithState API, after previously reporting issue 
> SPARK-11932 (https://issues.apache.org/jira/browse/SPARK-11932). 
> For initial report, see 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-streaming-1-6-0-RC4-NullPointerException-using-mapWithState-tt15830.html
> Narrowed it down to an issue unrelated to Kafka directstream, but, after 
> observing very unpredictable behavior as a result of changes to the Kafka 
> messages format, it seems to be related to KryoSerialization in specific.
> For test case, see my modified version of the StatefulNetworkWordCount 
> example: https://gist.github.com/juyttenh/9b4a4103699a7d5f698f 
> To reproduce, use RC4 of Spark-1.6.0 and 
> - start nc:
> {code}nc -lk {code}
> - execute the supplied test case: 
> {code}bin/spark-submit --class 
> org.apache.spark.examples.streaming.StatefulNetworkWordCount --master 
> local[2] file:///some-assembly-jar localhost {code}
> Error scenario:
> - put some text in the nc console with the job running, and observe correct 
> functioning of the word count
> - kill the spark job
> - add some more text in the nc console (with the job not running)
> - restart the spark job and observe the NPE
> (you might need to repeat this a couple of times to trigger the exception)
> Here's the stacktrace: 
> {code}
> 15/12/31 11:43:47 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 5)
> java.lang.NullPointerException
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:103)
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111)
>   at 
> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:56)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:154)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at 

[jira] [Issue Comment Deleted] (SPARK-12720) SQL generation support for cube, rollup, and grouping set

2016-02-07 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-12720:

Comment: was deleted

(was: Since Subquery Alias does not work well, my PR has to wait until 
Spark-13206 is fixed. Thanks!)

> SQL generation support for cube, rollup, and grouping set
> -
>
> Key: SPARK-12720
> URL: https://issues.apache.org/jira/browse/SPARK-12720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>
> {{HiveCompatibilitySuite}} can be useful for bootstrapping test coverage. 
> Please refer to SPARK-11012 for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13230) HashMap.merged not working properly with Spark

2016-02-07 Thread Alin Treznai (JIRA)
Alin Treznai created SPARK-13230:


 Summary: HashMap.merged not working properly with Spark
 Key: SPARK-13230
 URL: https://issues.apache.org/jira/browse/SPARK-13230
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.6.0
 Environment: Ubuntu 14.04.3, Scala 2.11.7, Spark 1.6.0
Reporter: Alin Treznai


Using HashMap.merged with Spark fails with NullPointerException.

import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.immutable.HashMap

object MergeTest {

  def mergeFn:(HashMap[String, Long], HashMap[String, Long]) => HashMap[String, 
Long] = {
case (m1, m2) => m1.merged(m2){ case (x,y) => (x._1, x._2 + y._2) }
  }

  def empty = HashMap.empty[String,Long]

  def main(args: Array[String]) = {
val input = Seq(HashMap("A" -> 1L), HashMap("A" -> 2L, "B" -> 
3L),HashMap("A" -> 2L, "C" -> 4L))
val conf = new SparkConf().setAppName("MergeTest").setMaster("local[*]")
val sc = new SparkContext(conf)
val result = sc.parallelize(input).reduce(mergeFn)
println(s"Result=$result")
sc.stop()
  }

}

Error message:

org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1169)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
at MergeTest$.main(MergeTest.scala:21)
at MergeTest.main(MergeTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
Caused by: java.lang.NullPointerException
at 
MergeTest$$anonfun$mergeFn$1$$anonfun$apply$1.apply(MergeTest.scala:12)
at 
MergeTest$$anonfun$mergeFn$1$$anonfun$apply$1.apply(MergeTest.scala:12)
at scala.collection.immutable.HashMap$$anon$2.apply(HashMap.scala:148)
at 
scala.collection.immutable.HashMap$HashMap1.updated0(HashMap.scala:200)
at 
scala.collection.immutable.HashMap$HashTrieMap.updated0(HashMap.scala:322)
at 
scala.collection.immutable.HashMap$HashTrieMap.merge0(HashMap.scala:463)
at scala.collection.immutable.HashMap.merged(HashMap.scala:117)
at MergeTest$$anonfun$mergeFn$1.apply(MergeTest.scala:12)
at MergeTest$$anonfun$mergeFn$1.apply(MergeTest.scala:11)
at 
org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1020)
at 
org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1017)
at 
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1165)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13230) HashMap.merged not working properly with Spark

2016-02-07 Thread Alin Treznai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alin Treznai updated SPARK-13230:
-
Description: 
Using HashMap.merged with Spark fails with NullPointerException.

{noformat}
import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.immutable.HashMap

object MergeTest {

  def mergeFn:(HashMap[String, Long], HashMap[String, Long]) => HashMap[String, 
Long] = {
case (m1, m2) => m1.merged(m2){ case (x,y) => (x._1, x._2 + y._2) }
  }

  def main(args: Array[String]) = {
val input = Seq(HashMap("A" -> 1L), HashMap("A" -> 2L, "B" -> 
3L),HashMap("A" -> 2L, "C" -> 4L))
val conf = new SparkConf().setAppName("MergeTest").setMaster("local[*]")
val sc = new SparkContext(conf)
val result = sc.parallelize(input).reduce(mergeFn)
println(s"Result=$result")
sc.stop()
  }

}
{noformat}

Error message:

org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1169)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
at MergeTest$.main(MergeTest.scala:21)
at MergeTest.main(MergeTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
Caused by: java.lang.NullPointerException
at 
MergeTest$$anonfun$mergeFn$1$$anonfun$apply$1.apply(MergeTest.scala:12)
at 
MergeTest$$anonfun$mergeFn$1$$anonfun$apply$1.apply(MergeTest.scala:12)
at scala.collection.immutable.HashMap$$anon$2.apply(HashMap.scala:148)
at 
scala.collection.immutable.HashMap$HashMap1.updated0(HashMap.scala:200)
at 
scala.collection.immutable.HashMap$HashTrieMap.updated0(HashMap.scala:322)
at 
scala.collection.immutable.HashMap$HashTrieMap.merge0(HashMap.scala:463)
at scala.collection.immutable.HashMap.merged(HashMap.scala:117)
at MergeTest$$anonfun$mergeFn$1.apply(MergeTest.scala:12)
at MergeTest$$anonfun$mergeFn$1.apply(MergeTest.scala:11)
at 
org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1020)
at 
org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1017)
at 
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1165)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



  was:
Using HashMap.merged with Spark fails with NullPointerException.

{noformat}
import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.immutable.HashMap

object MergeTest {

  def mergeFn:(HashMap[String, Long], HashMap[String, Long]) => HashMap[String, 
Long] = {
case (m1, m2) => m1.merged(m2){ case (x,y) => (x._1, x._2 + y._2) }
  }

  def empty = HashMap.empty[String,Long]

  def main(args: Array[String]) = {
val input = Seq(HashMap("A" -> 1L), HashMap("A" -> 2L, "B" -> 
3L),HashMap("A" -> 2L, "C" -> 4L))
val conf = new SparkConf().setAppName("MergeTest").setMaster("local[*]")
val sc = new SparkContext(conf)
val result = sc.parallelize(input).reduce(mergeFn)
println(s"Result=$result")
sc.stop()
  }

}
{noformat}

Error message:

org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1169)
at 

[jira] [Commented] (SPARK-13201) Make a private non-deprecated version of setRuns

2016-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136651#comment-15136651
 ] 

Apache Spark commented on SPARK-13201:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/2

> Make a private non-deprecated version of setRuns
> 
>
> Key: SPARK-13201
> URL: https://issues.apache.org/jira/browse/SPARK-13201
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: holdenk
>Priority: Trivial
>
> Make a private non-deprecated version of setRuns API so that we can call it 
> from the PythonAPI without deprecation warnings in our own build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13200) Investigate math.round on integer number in MFDataGenerator.scala:109

2016-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136652#comment-15136652
 ] 

Apache Spark commented on SPARK-13200:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/2

> Investigate math.round on integer number in MFDataGenerator.scala:109
> -
>
> Key: SPARK-13200
> URL: https://issues.apache.org/jira/browse/SPARK-13200
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: holdenk
>Priority: Minor
>
> Apparently we are calling round on an integer which now in Scala 2.11 results 
> in a warning (it didn't make any sense before either). Figure out if this is 
> a mistake we can just remove or if we got the types wrong somewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13201) Make a private non-deprecated version of setRuns

2016-02-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13201:


Assignee: (was: Apache Spark)

> Make a private non-deprecated version of setRuns
> 
>
> Key: SPARK-13201
> URL: https://issues.apache.org/jira/browse/SPARK-13201
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: holdenk
>Priority: Trivial
>
> Make a private non-deprecated version of setRuns API so that we can call it 
> from the PythonAPI without deprecation warnings in our own build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13201) Make a private non-deprecated version of setRuns

2016-02-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13201:


Assignee: Apache Spark

> Make a private non-deprecated version of setRuns
> 
>
> Key: SPARK-13201
> URL: https://issues.apache.org/jira/browse/SPARK-13201
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Trivial
>
> Make a private non-deprecated version of setRuns API so that we can call it 
> from the PythonAPI without deprecation warnings in our own build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13230) HashMap.merged not working properly with Spark

2016-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136654#comment-15136654
 ] 

Sean Owen commented on SPARK-13230:
---

That's a weird one. This is a little simpler as 

{code}
sc.parallelize(input).reduce((m1,m2) => m1.merged(m2) { case ((k,v1),(_,v2)) => 
(k, v1+v2) })
{code}

which yields on the driver

{code}
Caused by: scala.MatchError: (null,null) (of class scala.Tuple2)
  at $anonfun$1$$anonfun$apply$1.apply(:28)
  at $anonfun$1$$anonfun$apply$1.apply(:28)
  at scala.collection.immutable.HashMap$$anon$2$$anon$3.apply(HashMap.scala:150)
  at scala.collection.immutable.HashMap$HashMap1.updated0(HashMap.scala:200)
  at scala.collection.immutable.HashMap$HashTrieMap.updated0(HashMap.scala:322)
  at scala.collection.immutable.HashMap$HashMap1.merge0(HashMap.scala:225)
  at scala.collection.immutable.HashMap.merged(HashMap.scala:117)
  at $anonfun$1.apply(:28)
  at $anonfun$1.apply(:28)
  at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:926)
  at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:923)
  at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:57)
  at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1185)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1658)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1620)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1609)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{code}

It occurs when reducing the partition and I can't figure out how it would be 
fed a null. collecting or taking the RDD is fine. I suspect something strange 
related to the merged method which is only on immutable.HashMap, but there's no 
good reason that would be a problem.

I would suggest trying a different snippet of Scala code to merge the maps for 
now, since that seems to work.

> HashMap.merged not working properly with Spark
> --
>
> Key: SPARK-13230
> URL: https://issues.apache.org/jira/browse/SPARK-13230
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
> Environment: Ubuntu 14.04.3, Scala 2.11.7, Spark 1.6.0
>Reporter: Alin Treznai
>
> Using HashMap.merged with Spark fails with NullPointerException.
> {noformat}
> import org.apache.spark.{SparkConf, SparkContext}
> import scala.collection.immutable.HashMap
> object MergeTest {
>   def mergeFn:(HashMap[String, Long], HashMap[String, Long]) => 
> HashMap[String, Long] = {
> case (m1, m2) => m1.merged(m2){ case (x,y) => (x._1, x._2 + y._2) }
>   }
>   def main(args: Array[String]) = {
> val input = Seq(HashMap("A" -> 1L), HashMap("A" -> 2L, "B" -> 
> 3L),HashMap("A" -> 2L, "C" -> 4L))
> val conf = new SparkConf().setAppName("MergeTest").setMaster("local[*]")
> val sc = new SparkContext(conf)
> val result = sc.parallelize(input).reduce(mergeFn)
> println(s"Result=$result")
> sc.stop()
>   }
> }
> {noformat}
> Error message:
> org.apache.spark.SparkDriverExecutionException: Execution error
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1169)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
> at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
> at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
> at MergeTest$.main(MergeTest.scala:21)
> at MergeTest.main(MergeTest.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> Caused by: java.lang.NullPointerException
> at 
> 

[jira] [Updated] (SPARK-13230) HashMap.merged not working properly with Spark

2016-02-07 Thread Alin Treznai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alin Treznai updated SPARK-13230:
-
Description: 
Using HashMap.merged with Spark fails with NullPointerException.

{noformat}
import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.immutable.HashMap

object MergeTest {

  def mergeFn:(HashMap[String, Long], HashMap[String, Long]) => HashMap[String, 
Long] = {
case (m1, m2) => m1.merged(m2){ case (x,y) => (x._1, x._2 + y._2) }
  }

  def empty = HashMap.empty[String,Long]

  def main(args: Array[String]) = {
val input = Seq(HashMap("A" -> 1L), HashMap("A" -> 2L, "B" -> 
3L),HashMap("A" -> 2L, "C" -> 4L))
val conf = new SparkConf().setAppName("MergeTest").setMaster("local[*]")
val sc = new SparkContext(conf)
val result = sc.parallelize(input).reduce(mergeFn)
println(s"Result=$result")
sc.stop()
  }

}
{noformat}

Error message:

org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1169)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
at MergeTest$.main(MergeTest.scala:21)
at MergeTest.main(MergeTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
Caused by: java.lang.NullPointerException
at 
MergeTest$$anonfun$mergeFn$1$$anonfun$apply$1.apply(MergeTest.scala:12)
at 
MergeTest$$anonfun$mergeFn$1$$anonfun$apply$1.apply(MergeTest.scala:12)
at scala.collection.immutable.HashMap$$anon$2.apply(HashMap.scala:148)
at 
scala.collection.immutable.HashMap$HashMap1.updated0(HashMap.scala:200)
at 
scala.collection.immutable.HashMap$HashTrieMap.updated0(HashMap.scala:322)
at 
scala.collection.immutable.HashMap$HashTrieMap.merge0(HashMap.scala:463)
at scala.collection.immutable.HashMap.merged(HashMap.scala:117)
at MergeTest$$anonfun$mergeFn$1.apply(MergeTest.scala:12)
at MergeTest$$anonfun$mergeFn$1.apply(MergeTest.scala:11)
at 
org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1020)
at 
org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1017)
at 
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1165)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



  was:
Using HashMap.merged with Spark fails with NullPointerException.

import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.immutable.HashMap

object MergeTest {

  def mergeFn:(HashMap[String, Long], HashMap[String, Long]) => HashMap[String, 
Long] = {\\
case (m1, m2) => m1.merged(m2){ case (x,y) => (x._1, x._2 + y._2) }\\
  }\\

  def empty = HashMap.empty[String,Long]

  def main(args: Array[String]) = {
val input = Seq(HashMap("A" -> 1L), HashMap("A" -> 2L, "B" -> 
3L),HashMap("A" -> 2L, "C" -> 4L))
val conf = new SparkConf().setAppName("MergeTest").setMaster("local[*]")
val sc = new SparkContext(conf)
val result = sc.parallelize(input).reduce(mergeFn)
println(s"Result=$result")
sc.stop()
  }

}

Error message:

org.apache.spark.SparkDriverExecutionException: Execution error
at 

[jira] [Updated] (SPARK-13230) HashMap.merged not working properly with Spark

2016-02-07 Thread Alin Treznai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alin Treznai updated SPARK-13230:
-
Description: 
Using HashMap.merged with Spark fails with NullPointerException.

import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.immutable.HashMap

object MergeTest {

  def mergeFn:(HashMap[String, Long], HashMap[String, Long]) => HashMap[String, 
Long] = {\\
case (m1, m2) => m1.merged(m2){ case (x,y) => (x._1, x._2 + y._2) }\\
  }\\

  def empty = HashMap.empty[String,Long]

  def main(args: Array[String]) = {
val input = Seq(HashMap("A" -> 1L), HashMap("A" -> 2L, "B" -> 
3L),HashMap("A" -> 2L, "C" -> 4L))
val conf = new SparkConf().setAppName("MergeTest").setMaster("local[*]")
val sc = new SparkContext(conf)
val result = sc.parallelize(input).reduce(mergeFn)
println(s"Result=$result")
sc.stop()
  }

}

Error message:

org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1169)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
at MergeTest$.main(MergeTest.scala:21)
at MergeTest.main(MergeTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
Caused by: java.lang.NullPointerException
at 
MergeTest$$anonfun$mergeFn$1$$anonfun$apply$1.apply(MergeTest.scala:12)
at 
MergeTest$$anonfun$mergeFn$1$$anonfun$apply$1.apply(MergeTest.scala:12)
at scala.collection.immutable.HashMap$$anon$2.apply(HashMap.scala:148)
at 
scala.collection.immutable.HashMap$HashMap1.updated0(HashMap.scala:200)
at 
scala.collection.immutable.HashMap$HashTrieMap.updated0(HashMap.scala:322)
at 
scala.collection.immutable.HashMap$HashTrieMap.merge0(HashMap.scala:463)
at scala.collection.immutable.HashMap.merged(HashMap.scala:117)
at MergeTest$$anonfun$mergeFn$1.apply(MergeTest.scala:12)
at MergeTest$$anonfun$mergeFn$1.apply(MergeTest.scala:11)
at 
org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1020)
at 
org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1017)
at 
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1165)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



  was:
Using HashMap.merged with Spark fails with NullPointerException.

import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.immutable.HashMap

object MergeTest {

  def mergeFn:(HashMap[String, Long], HashMap[String, Long]) => HashMap[String, 
Long] = {
case (m1, m2) => m1.merged(m2){ case (x,y) => (x._1, x._2 + y._2) }
  }

  def empty = HashMap.empty[String,Long]

  def main(args: Array[String]) = {
val input = Seq(HashMap("A" -> 1L), HashMap("A" -> 2L, "B" -> 
3L),HashMap("A" -> 2L, "C" -> 4L))
val conf = new SparkConf().setAppName("MergeTest").setMaster("local[*]")
val sc = new SparkContext(conf)
val result = sc.parallelize(input).reduce(mergeFn)
println(s"Result=$result")
sc.stop()
  }

}

Error message:

org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1169)
at 

[jira] [Updated] (SPARK-13201) Make a private non-deprecated version of setRuns

2016-02-07 Thread holdenk (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

holdenk updated SPARK-13201:

Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-13175

> Make a private non-deprecated version of setRuns
> 
>
> Key: SPARK-13201
> URL: https://issues.apache.org/jira/browse/SPARK-13201
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: holdenk
>Priority: Trivial
>
> Make a private non-deprecated version of setRuns API so that we can call it 
> from the PythonAPI without deprecation warnings in our own build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org