[jira] [Comment Edited] (SPARK-6678) select count(DISTINCT C_UID) from parquetdir may be can optimize

2016-10-12 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567965#comment-15567965
 ] 

Littlestar edited comment on SPARK-6678 at 10/12/16 7:57 AM:
-

I compare it to my other spark code, it do same thing: count distinct C_UID .
 C_UID===>split 1000 partion==>each partion computed distinct C_UID==>sum total 
distinct count.


I think “org.apache.spark.rdd.RDD.collect(RDD.scala:813)” is very slow, it's 
not necessary.
I just need distict count, not each C_UID value.



was (Author: cnstar9988):
I compre it to my other spark code, it do same thing: count distinct C_UID .

I think “org.apache.spark.rdd.RDD.collect(RDD.scala:813)” is very slow, it's 
not necessary.
I just need distict count, not each C_UID value.


> select count(DISTINCT C_UID) from parquetdir may be can optimize
> 
>
> Key: SPARK-6678
> URL: https://issues.apache.org/jira/browse/SPARK-6678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Littlestar
>Priority: Minor
>
> 2.2T parquet files(5000 files total, 100 billion records, 2 billion unique 
> C_UID).
> I run the following sql, may be RDD.collect is very slow 
> select count(DISTINCT C_UID) from parquetdir
> select count(DISTINCT C_UID) from parquetdir
> collect at SparkPlan.scala:83 +details
> org.apache.spark.rdd.RDD.collect(RDD.scala:813)
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:83)
> org.apache.spark.sql.DataFrame.collect(DataFrame.scala:815)
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178)
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:606)
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
> java.security.AccessController.doPrivileged(Native Method)
> javax.security.auth.Subject.doAs(Subject.java:415)
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493)
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
> com.sun.proxy.$Proxy23.executeStatementAsync(Unknown Source)
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6678) select count(DISTINCT C_UID) from parquetdir may be can optimize

2016-10-12 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567965#comment-15567965
 ] 

Littlestar commented on SPARK-6678:
---

I compre it to my other spark code, it do same thing: count distinct C_UID .

I think “org.apache.spark.rdd.RDD.collect(RDD.scala:813)” is very slow, it's 
not necessary.
I just need distict count, not each C_UID value.


> select count(DISTINCT C_UID) from parquetdir may be can optimize
> 
>
> Key: SPARK-6678
> URL: https://issues.apache.org/jira/browse/SPARK-6678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Littlestar
>Priority: Minor
>
> 2.2T parquet files(5000 files total, 100 billion records, 2 billion unique 
> C_UID).
> I run the following sql, may be RDD.collect is very slow 
> select count(DISTINCT C_UID) from parquetdir
> select count(DISTINCT C_UID) from parquetdir
> collect at SparkPlan.scala:83 +details
> org.apache.spark.rdd.RDD.collect(RDD.scala:813)
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:83)
> org.apache.spark.sql.DataFrame.collect(DataFrame.scala:815)
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178)
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:606)
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
> java.security.AccessController.doPrivileged(Native Method)
> javax.security.auth.Subject.doAs(Subject.java:415)
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493)
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
> com.sun.proxy.$Proxy23.executeStatementAsync(Unknown Source)
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2015-08-20 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704534#comment-14704534
 ] 

Littlestar commented on SPARK-2883:
---

spark 1.4.1: The orc file writer relies on HiveContext and Hive metastore



 Spark Support for ORCFile format
 

 Key: SPARK-2883
 URL: https://issues.apache.org/jira/browse/SPARK-2883
 Project: Spark
  Issue Type: New Feature
  Components: Input/Output, SQL
Reporter: Zhan Zhang
Assignee: Zhan Zhang
Priority: Critical
 Fix For: 1.4.0

 Attachments: 2014-09-12 07.05.24 pm Spark UI.png, 2014-09-12 07.07.19 
 pm jobtracker.png, orc.diff


 Verify the support of OrcInputFormat in spark, fix issues if exists and add 
 documentation of its usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3644) REST API for Spark application info (jobs / stages / tasks / storage info)

2015-07-09 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620328#comment-14620328
 ] 

Littlestar commented on SPARK-3644:
---

Does there has any json api reference document for spark 1.4.0, thanks.
I found nothing in spark 1.4 doc.

 REST API for Spark application info (jobs / stages / tasks / storage info)
 --

 Key: SPARK-3644
 URL: https://issues.apache.org/jira/browse/SPARK-3644
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core, Web UI
Reporter: Josh Rosen
Assignee: Imran Rashid
 Fix For: 1.4.0


 This JIRA is a forum to draft a design proposal for a REST interface for 
 accessing information about Spark applications, such as job / stage / task / 
 storage status.
 There have been a number of proposals to serve JSON representations of the 
 information displayed in Spark's web UI.  Given that we might redesign the 
 pages of the web UI (and possibly re-implement the UI as a client of a REST 
 API), the API endpoints and their responses should be independent of what we 
 choose to display on particular web UI pages / layouts.
 Let's start a discussion of what a good REST API would look like from 
 first-principles.  We can discuss what urls / endpoints expose access to 
 data, how our JSON responses will be formatted, how fields will be named, how 
 the API will be documented and tested, etc.
 Some links for inspiration:
 https://developer.github.com/v3/
 http://developer.netflix.com/docs/REST_API_Reference
 https://helloreverb.com/developers/swagger



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5281) Registering table on RDD is giving MissingRequirementError

2015-06-28 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604632#comment-14604632
 ] 

Littlestar commented on SPARK-5281:
---

Does this patch merge to spark 1.3 branch, thanks.


 Registering table on RDD is giving MissingRequirementError
 --

 Key: SPARK-5281
 URL: https://issues.apache.org/jira/browse/SPARK-5281
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0, 1.3.1
Reporter: sarsol
Assignee: Iulian Dragos
Priority: Critical
 Fix For: 1.4.0


 Application crashes on this line  {{rdd.registerTempTable(temp)}}  in 1.2 
 version when using sbt or Eclipse SCALA IDE
 Stacktrace:
 {code}
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with 
 primordial classloader with boot classpath 
 [C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-library.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-reflect.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-actor.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-swing.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-compiler.jar;C:\Program
  Files\Java\jre7\lib\resources.jar;C:\Program 
 Files\Java\jre7\lib\rt.jar;C:\Program 
 Files\Java\jre7\lib\sunrsasign.jar;C:\Program 
 Files\Java\jre7\lib\jsse.jar;C:\Program 
 Files\Java\jre7\lib\jce.jar;C:\Program 
 Files\Java\jre7\lib\charsets.jar;C:\Program 
 Files\Java\jre7\lib\jfr.jar;C:\Program Files\Java\jre7\classes] not found.
   at 
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
   at 
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115)
   at 
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
   at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335)
   at scala.reflect.api.Universe.typeOf(Universe.scala:59)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.attributesFor(ScalaReflection.scala:94)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:33)
   at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:111)
   at 
 com.sar.spark.dq.poc.SparkPOC$delayedInit$body.apply(SparkPOC.scala:43)
   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
   at 
 scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
   at scala.App$$anonfun$main$1.apply(App.scala:71)
   at scala.App$$anonfun$main$1.apply(App.scala:71)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at 
 scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
   at scala.App$class.main(App.scala:71)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-04-29 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519515#comment-14519515
 ] 

Littlestar commented on SPARK-6461:
---

same to SPARK-7193

 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 --

 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar

 I use mesos run spak 1.3.0 ./run-example SparkPi
 but failed.
 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 spark.executorEnv.PATH
 spark.executorEnv.HADOOP_HOME
 spark.executorEnv.JAVA_HOME
 E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-04-29 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar resolved SPARK-6461.
---
Resolution: Duplicate

missing enviroment on spark-1.3.1-hadoop2.4.0.tgz.

MARK here:
spark-env.sh on driver node only effects on  spark-submit(driver node).

spark-1.3.1-hadoop2.4.0.tgz/conf/spark-env.sh work well with worker node.

 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 --

 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar

 I use mesos run spak 1.3.0 ./run-example SparkPi
 but failed.
 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 spark.executorEnv.PATH
 spark.executorEnv.HADOOP_HOME
 spark.executorEnv.JAVA_HOME
 E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release

2015-04-28 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516801#comment-14516801
 ] 

Littlestar commented on SPARK-7193:
---

{noformat}
15/04/28 18:45:53 INFO spark.SparkContext: Running Spark version 1.3.1

Spark context available as sc.
15/04/28 18:45:57 INFO repl.SparkILoop: Created sql context (with Hive 
support)..
SQL context available as sqlContext.

scala val data = Array(1, 2, 3, 4, 5) 
data: Array[Int] = Array(1, 2, 3, 4, 5)

scala val distData = sc.parallelize(data)
distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at 
parallelize at console:23

scala distData.reduce(_+_) 
---
org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in 
stage 0.0 failed 4 times, most recent failure: Lost task 7.3 in stage 0.0 (TID 
17, hpblade06): ExecutorLostFailure (executor 
20150427-165835-1214949568-5050-6-S0 lost)
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)


{noformat}

 Spark on Mesos may need more tests for spark 1.3.1 release
 

 Key: SPARK-7193
 URL: https://issues.apache.org/jira/browse/SPARK-7193
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.3.1
Reporter: Littlestar

 Spark on Mesos may need more tests for spark 1.3.1 release
 http://spark.apache.org/docs/latest/running-on-mesos.html
 I tested mesos 0.21.1/0.22.0/0.22.1 RC4.
 It just work well with ./bin/spark-shell --master mesos://host:5050.
 Any task need more than one nodes, it will throws the following exceptions.
 {noformat}
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 10 in stage 0.0 failed 4 times, most recent failure: 
 Lost task 10.3 in stage 0.0 (TID 127, hpblade05): 
 java.lang.IllegalStateException: unread block data
   at 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2393)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1378)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
   at 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
   at 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:679)
 Driver stacktrace:
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
   at 
 

[jira] [Comment Edited] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release

2015-04-28 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516801#comment-14516801
 ] 

Littlestar edited comment on SPARK-7193 at 4/28/15 10:51 AM:
-

1 master + 7 nodes (spark 1.3.1 + mesos 0.22.0/0.22.1)

{noformat}
15/04/28 18:45:53 INFO spark.SparkContext: Running Spark version 1.3.1

Spark context available as sc.
15/04/28 18:45:57 INFO repl.SparkILoop: Created sql context (with Hive 
support)..
SQL context available as sqlContext.

scala val data = Array(1, 2, 3, 4, 5) 
data: Array[Int] = Array(1, 2, 3, 4, 5)

scala val distData = sc.parallelize(data)
distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at 
parallelize at console:23

scala distData.reduce(_+_) 
---
org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in 
stage 0.0 failed 4 times, most recent failure: Lost task 7.3 in stage 0.0 (TID 
17, hpblade06): ExecutorLostFailure (executor 
20150427-165835-1214949568-5050-6-S0 lost)
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)


{noformat}


was (Author: cnstar9988):
{noformat}
15/04/28 18:45:53 INFO spark.SparkContext: Running Spark version 1.3.1

Spark context available as sc.
15/04/28 18:45:57 INFO repl.SparkILoop: Created sql context (with Hive 
support)..
SQL context available as sqlContext.

scala val data = Array(1, 2, 3, 4, 5) 
data: Array[Int] = Array(1, 2, 3, 4, 5)

scala val distData = sc.parallelize(data)
distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at 
parallelize at console:23

scala distData.reduce(_+_) 
---
org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in 
stage 0.0 failed 4 times, most recent failure: Lost task 7.3 in stage 0.0 (TID 
17, hpblade06): ExecutorLostFailure (executor 
20150427-165835-1214949568-5050-6-S0 lost)
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)


{noformat}

 Spark on Mesos may need more tests for spark 1.3.1 release
 

 Key: SPARK-7193
 URL: https://issues.apache.org/jira/browse/SPARK-7193
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.3.1
Reporter: Littlestar

 Spark on Mesos may need more tests for spark 1.3.1 release
 http://spark.apache.org/docs/latest/running-on-mesos.html
 I tested mesos 

[jira] [Comment Edited] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release

2015-04-28 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516801#comment-14516801
 ] 

Littlestar edited comment on SPARK-7193 at 4/28/15 10:53 AM:
-

1 master + 7 nodes (spark 1.3.1 + mesos 0.22.0/0.22.1)

{noformat}
./spark-shell --master mesos://hpblade02:5050

15/04/28 18:45:53 INFO spark.SparkContext: Running Spark version 1.3.1

Spark context available as sc.
15/04/28 18:45:57 INFO repl.SparkILoop: Created sql context (with Hive 
support)..
SQL context available as sqlContext.

scala val data = Array(1, 2, 3, 4, 5) 
data: Array[Int] = Array(1, 2, 3, 4, 5)

scala val distData = sc.parallelize(data)
distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at 
parallelize at console:23

scala distData.reduce(_+_) 
---
org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in 
stage 0.0 failed 4 times, most recent failure: Lost task 7.3 in stage 0.0 (TID 
17, hpblade06): ExecutorLostFailure (executor 
20150427-165835-1214949568-5050-6-S0 lost)
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)


{noformat}


was (Author: cnstar9988):
1 master + 7 nodes (spark 1.3.1 + mesos 0.22.0/0.22.1)

{noformat}
15/04/28 18:45:53 INFO spark.SparkContext: Running Spark version 1.3.1

Spark context available as sc.
15/04/28 18:45:57 INFO repl.SparkILoop: Created sql context (with Hive 
support)..
SQL context available as sqlContext.

scala val data = Array(1, 2, 3, 4, 5) 
data: Array[Int] = Array(1, 2, 3, 4, 5)

scala val distData = sc.parallelize(data)
distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at 
parallelize at console:23

scala distData.reduce(_+_) 
---
org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in 
stage 0.0 failed 4 times, most recent failure: Lost task 7.3 in stage 0.0 (TID 
17, hpblade06): ExecutorLostFailure (executor 
20150427-165835-1214949568-5050-6-S0 lost)
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)


{noformat}

 Spark on Mesos may need more tests for spark 1.3.1 release
 

 Key: SPARK-7193
 URL: https://issues.apache.org/jira/browse/SPARK-7193
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.3.1
Reporter: Littlestar

 Spark on Mesos may need more tests for spark 1.3.1 

[jira] [Commented] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release

2015-04-28 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516807#comment-14516807
 ] 

Littlestar commented on SPARK-7193:
---

exception on some mesos worknode log.
{noformat}
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/executor/MesosExecutorBackend
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.executor.MesosExecutorBackend
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: org.apache.spark.executor.MesosExecutorBackend. 
Program will exit.
{noformat}

 Spark on Mesos may need more tests for spark 1.3.1 release
 

 Key: SPARK-7193
 URL: https://issues.apache.org/jira/browse/SPARK-7193
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.3.1
Reporter: Littlestar

 Spark on Mesos may need more tests for spark 1.3.1 release
 http://spark.apache.org/docs/latest/running-on-mesos.html
 I tested mesos 0.21.1/0.22.0/0.22.1 RC4.
 It just work well with ./bin/spark-shell --master mesos://host:5050.
 Any task need more than one nodes, it will throws the following exceptions.
 {noformat}
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 10 in stage 0.0 failed 4 times, most recent failure: 
 Lost task 10.3 in stage 0.0 (TID 127, hpblade05): 
 java.lang.IllegalStateException: unread block data
   at 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2393)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1378)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
   at 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
   at 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:679)
 Driver stacktrace:
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
   at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
   at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
 15/04/28 15:33:18 ERROR scheduler.LiveListenerBus: Listener 
 EventLoggingListener threw an exception
 java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
   at 
 org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
   at 

[jira] [Resolved] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release

2015-04-28 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar resolved SPARK-7193.
---
Resolution: Invalid

I think official document missing some notes about Spark on Mesos

I worked well with following:

extract spark-1.3.1-bin-hadoop2.4.tgz, and modify conf\spark-env.sh and repack 
with new spark-1.3.1-bin-hadoop2.4.tgz, and then put to hdfs

spark-env.sh set JAVA_HOME, HADOO_CONF_DIR, HADOO_HOME






 Spark on Mesos may need more tests for spark 1.3.1 release
 

 Key: SPARK-7193
 URL: https://issues.apache.org/jira/browse/SPARK-7193
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.3.1
Reporter: Littlestar

 Spark on Mesos may need more tests for spark 1.3.1 release
 http://spark.apache.org/docs/latest/running-on-mesos.html
 I tested mesos 0.21.1/0.22.0/0.22.1 RC4.
 It just work well with ./bin/spark-shell --master mesos://host:5050.
 Any task need more than one nodes, it will throws the following exceptions.
 {noformat}
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 10 in stage 0.0 failed 4 times, most recent failure: 
 Lost task 10.3 in stage 0.0 (TID 127, hpblade05): 
 java.lang.IllegalStateException: unread block data
   at 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2393)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1378)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
   at 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
   at 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:679)
 Driver stacktrace:
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
   at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
   at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
 15/04/28 15:33:18 ERROR scheduler.LiveListenerBus: Listener 
 EventLoggingListener threw an exception
 java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
   at 
 org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144)
   at 
 org.apache.spark.scheduler.EventLoggingListener.onStageCompleted(EventLoggingListener.scala:165)
   at 
 org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:32)
   at 
 org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
   at 
 

[jira] [Comment Edited] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release

2015-04-28 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518610#comment-14518610
 ] 

Littlestar edited comment on SPARK-7193 at 4/29/15 2:40 AM:


I think official document missing some notes about Spark on Mesos

I worked well with following:

extract spark-1.3.1-bin-hadoop2.4.tgz, and modify conf\spark-env.sh and repack 
with new spark-1.3.1-bin-hadoop2.4.tgz, and then put to hdfs

spark-env.sh set JAVA_HOME, HADOOP_CONF_DIR, HADOOP_HOME







was (Author: cnstar9988):
I think official document missing some notes about Spark on Mesos

I worked well with following:

extract spark-1.3.1-bin-hadoop2.4.tgz, and modify conf\spark-env.sh and repack 
with new spark-1.3.1-bin-hadoop2.4.tgz, and then put to hdfs

spark-env.sh set JAVA_HOME, HADOO_CONF_DIR, HADOO_HOME






 Spark on Mesos may need more tests for spark 1.3.1 release
 

 Key: SPARK-7193
 URL: https://issues.apache.org/jira/browse/SPARK-7193
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.3.1
Reporter: Littlestar

 Spark on Mesos may need more tests for spark 1.3.1 release
 http://spark.apache.org/docs/latest/running-on-mesos.html
 I tested mesos 0.21.1/0.22.0/0.22.1 RC4.
 It just work well with ./bin/spark-shell --master mesos://host:5050.
 Any task need more than one nodes, it will throws the following exceptions.
 {noformat}
 Exception in thread main org.apache.spark.SparkException: Job aborted due 
 to stage failure: Task 10 in stage 0.0 failed 4 times, most recent failure: 
 Lost task 10.3 in stage 0.0 (TID 127, hpblade05): 
 java.lang.IllegalStateException: unread block data
   at 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2393)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1378)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
   at 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
   at 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:679)
 Driver stacktrace:
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
   at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
   at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
 15/04/28 15:33:18 ERROR scheduler.LiveListenerBus: Listener 
 EventLoggingListener threw an exception
 java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
   at 
 org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
   at scala.Option.foreach(Option.scala:236)
   at 
 

[jira] [Created] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release

2015-04-28 Thread Littlestar (JIRA)
Littlestar created SPARK-7193:
-

 Summary: Spark on Mesos may need more tests for spark 1.3.1 
release
 Key: SPARK-7193
 URL: https://issues.apache.org/jira/browse/SPARK-7193
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.3.1
Reporter: Littlestar


Spark on Mesos may need more tests for spark 1.3.1 release
http://spark.apache.org/docs/latest/running-on-mesos.html

I tested mesos 0.21.1/0.22.0/0.22.1 RC4.
It just work well with ./bin/spark-shell --master mesos://host:5050.
Any task need more than one nodes, it will throws the following exceptions.

{noformat}
Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 10 in stage 0.0 failed 4 times, most recent failure: Lost 
task 10.3 in stage 0.0 (TID 127, hpblade05): java.lang.IllegalStateException: 
unread block data
at 
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2393)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1378)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
15/04/28 15:33:18 ERROR scheduler.LiveListenerBus: Listener 
EventLoggingListener threw an exception
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
at 
org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144)
at 
org.apache.spark.scheduler.EventLoggingListener.onStageCompleted(EventLoggingListener.scala:165)
at 
org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:32)
at 
org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at 
org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at 
org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
at 
org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
at 
org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76)
at 
org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
at 

[jira] [Comment Edited] (SPARK-6151) schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size

2015-04-13 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491881#comment-14491881
 ] 

Littlestar edited comment on SPARK-6151 at 4/14/15 1:04 AM:


The HDFS Block Size is set once when you first install Hadoop.
blockSize can be changed when File create, but spark has no way to change 
blockSize.
FSDataOutputStream org.apache.hadoop.fs.FileSystem.create(Path f, boolean 
overwrite, int bufferSize, short replication, long blockSize) throws IOException




was (Author: cnstar9988):
The HDFS Block Size is set once when you first install Hadoop.
blockSize can be changed when File create.
FSDataOutputStream org.apache.hadoop.fs.FileSystem.create(Path f, boolean 
overwrite, int bufferSize, short replication, long blockSize) throws IOException



 schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size
 ---

 Key: SPARK-6151
 URL: https://issues.apache.org/jira/browse/SPARK-6151
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Trivial

 How schemaRDD to parquetfile with saveAsParquetFile control the HDFS block 
 size. may be Configuration need.
 related question by others.
 http://apache-spark-user-list.1001560.n3.nabble.com/HDFS-block-size-for-parquet-output-tt21183.html
 http://qnalist.com/questions/5054892/spark-sql-parquet-and-impala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6151) schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size

2015-04-12 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491881#comment-14491881
 ] 

Littlestar commented on SPARK-6151:
---

The HDFS Block Size is set once when you first install Hadoop.
blockSize can be changed when File create.
FSDataOutputStream org.apache.hadoop.fs.FileSystem.create(Path f, boolean 
overwrite, int bufferSize, short replication, long blockSize) throws IOException



 schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size
 ---

 Key: SPARK-6151
 URL: https://issues.apache.org/jira/browse/SPARK-6151
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Trivial

 How schemaRDD to parquetfile with saveAsParquetFile control the HDFS block 
 size. may be Configuration need.
 related question by others.
 http://apache-spark-user-list.1001560.n3.nabble.com/HDFS-block-size-for-parquet-output-tt21183.html
 http://qnalist.com/questions/5054892/spark-sql-parquet-and-impala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6678) select count(DISTINCT C_UID) from parquetdir may be can optimize

2015-04-02 Thread Littlestar (JIRA)
Littlestar created SPARK-6678:
-

 Summary: select count(DISTINCT C_UID) from parquetdir may be can 
optimize
 Key: SPARK-6678
 URL: https://issues.apache.org/jira/browse/SPARK-6678
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor


2.2T parquet files(5000 files total, 100 billion records, 2 billion unique 
C_UID).

I run the following sql, may be RDD.collect is very slow 
select count(DISTINCT C_UID) from parquetdir

select count(DISTINCT C_UID) from parquetdir
collect at SparkPlan.scala:83 +details
org.apache.spark.rdd.RDD.collect(RDD.scala:813)
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:83)
org.apache.spark.sql.DataFrame.collect(DataFrame.scala:815)
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178)
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:415)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493)
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
com.sun.proxy.$Proxy23.executeStatementAsync(Unknown Source)
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use long instead

2015-03-30 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387748#comment-14387748
 ] 

Littlestar commented on SPARK-6239:
---

If I want to set minCount=2, I must use.setMinSupport(1.99/(rdd.count())), 
because of double's precision.


How to reopen this PR and mark relation to  pull/5246, thanks.


 Spark MLlib fpm#FPGrowth minSupport should use long instead
 ---

 Key: SPARK-6239
 URL: https://issues.apache.org/jira/browse/SPARK-6239
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor

 Spark MLlib fpm#FPGrowth minSupport should use long instead
 ==
 val minCount = math.ceil(minSupport * count).toLong
 because:
 1. [count]numbers of datasets is not kown before read.
 2. [minSupport ]double precision.
 from mahout#FPGrowthDriver.java
 addOption(minSupport, s, (Optional) The minimum number of times a 
 co-occurrence must be present.
   +  Default Value: 3, 3);
 I just want to set minCount=2 for test.
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use long instead

2015-03-30 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387748#comment-14387748
 ] 

Littlestar edited comment on SPARK-6239 at 3/31/15 1:09 AM:


I would imagine a relative value is more usually useful. 
when recnum=12345678, minsupport=0.003, recnum*minsupport near to integer.

Some result with little difference is lost because of double precision.



was (Author: cnstar9988):
If I want to set minCount=2, I must use.setMinSupport(1.99/(rdd.count())), 
because of double's precision.


How to reopen this PR and mark relation to  pull/5246, thanks.


 Spark MLlib fpm#FPGrowth minSupport should use long instead
 ---

 Key: SPARK-6239
 URL: https://issues.apache.org/jira/browse/SPARK-6239
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor

 Spark MLlib fpm#FPGrowth minSupport should use long instead
 ==
 val minCount = math.ceil(minSupport * count).toLong
 because:
 1. [count]numbers of datasets is not kown before read.
 2. [minSupport ]double precision.
 from mahout#FPGrowthDriver.java
 addOption(minSupport, s, (Optional) The minimum number of times a 
 co-occurrence must be present.
   +  Default Value: 3, 3);
 I just want to set minCount=2 for test.
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-1702) Mesos executor won't start because of a ClassNotFoundException

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375599#comment-14375599
 ] 

Littlestar edited comment on SPARK-1702 at 3/23/15 11:00 AM:
-

I met this on spak 1.3.0 + mesos 0.21.1 with run-example SparkPi

Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/executor/MesosExecutorBackend
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.executor.MesosExecutorBackend
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: org.apache.spark.executor.MesosExecutorBackend


was (Author: cnstar9988):
I met this on spak 1.3.0 + mesos 0.21.1 with run-example SparkPi

 Mesos executor won't start because of a ClassNotFoundException
 --

 Key: SPARK-1702
 URL: https://issues.apache.org/jira/browse/SPARK-1702
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.0.0
Reporter: Bouke van der Bijl
  Labels: executors, mesos, spark

 Some discussion here: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html
 Fix here (which is probably not the right fix): 
 https://github.com/apache/spark/pull/620
 This was broken in v0.9.0, was fixed in v0.9.1 and is now broken again.
 Error in Mesos executor stderr:
 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0502 17:31:42.672224 14688 exec.cpp:131] Version: 0.18.0
 I0502 17:31:42.674959 14707 exec.cpp:205] Executor registered on slave 
 20140501-182306-16842879-5050-10155-0
 14/05/02 17:31:42 INFO MesosExecutorBackend: Using Spark's default log4j 
 profile: org/apache/spark/log4j-defaults.properties
 14/05/02 17:31:42 INFO MesosExecutorBackend: Registered with Mesos as 
 executor ID 20140501-182306-16842879-5050-10155-0
 14/05/02 17:31:43 INFO SecurityManager: Changing view acls to: vagrant
 14/05/02 17:31:43 INFO SecurityManager: SecurityManager, is authentication 
 enabled: false are ui acls enabled: false users with view permissions: 
 Set(vagrant)
 14/05/02 17:31:43 INFO Slf4jLogger: Slf4jLogger started
 14/05/02 17:31:43 INFO Remoting: Starting remoting
 14/05/02 17:31:43 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://spark@localhost:50843]
 14/05/02 17:31:43 INFO Remoting: Remoting now listens on addresses: 
 [akka.tcp://spark@localhost:50843]
 java.lang.ClassNotFoundException: org/apache/spark/serializer/JavaSerializer
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:165)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:176)
 at org.apache.spark.executor.Executor.init(Executor.scala:106)
 at 
 org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:56)
 Exception in thread Thread-0 I0502 17:31:43.710039 14707 exec.cpp:412] 
 Deactivating the executor libprocess
 The problem is that it can't find the class. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375580#comment-14375580
 ] 

Littlestar edited comment on SPARK-6461 at 3/23/15 9:04 AM:


when I add MESOS_HADOOP_CONF_DIR at all mesos-master-env.sh and 
mesos-slave-env.sh , It throws the following error.
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/executor/MesosExecutorBackend
 Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.executor.MesosExecutorBackend

similar to https://issues.apache.org/jira/browse/SPARK-1702


was (Author: cnstar9988):
when I add MESOS_HADOOP_CONF_DIR at all mesos-master-env.sh and 
mesos-slave-env.sh , It throws the following error.
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/executor/MesosExecutorBackend
 Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.executor.MesosExecutorBackend

similar to https://github.com/apache/spark/pull/620

 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 --

 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar

 I use mesos run spak 1.3.0 ./run-example SparkPi
 but failed.
 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 spark.executorEnv.PATH
 spark.executorEnv.HADOOP_HOME
 spark.executorEnv.JAVA_HOME
 E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1480) Choose classloader consistently inside of Spark codebase

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375572#comment-14375572
 ] 

Littlestar commented on SPARK-1480:
---

I meet this bug on spark 1.3.0  + mesos 0.21.1
100%..

I0323 16:32:18.933440 14504 fetcher.cpp:64] Extracted resource 
'/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S4/frameworks/20150323-152848-1214949568-5050-21134-0009/executors/20150323-100710-1214949568-5050-3453-S4/runs/3d8f22f5-7fed-44ed-b5f9-98a219133911/spark-1.3.0-bin-2.4.0.tar.gz'
 into 
'/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S4/frameworks/20150323-152848-1214949568-5050-21134-0009/executors/20150323-100710-1214949568-5050-3453-S4/runs/3d8f22f5-7fed-44ed-b5f9-98a219133911'
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/executor/MesosExecutorBackend
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.executor.MesosExecutorBackend
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: org.apache.spark.executor.MesosExecutorBackend

 Choose classloader consistently inside of Spark codebase
 

 Key: SPARK-1480
 URL: https://issues.apache.org/jira/browse/SPARK-1480
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 The Spark codebase is not always consistent on which class loader it uses 
 when classlaoders are explicitly passed to things like serializers. This 
 caused SPARK-1403 and also causes a bug where when the driver has a modified 
 context class loader it is not translated correctly in local mode to the 
 (local) executor.
 In most cases what we want is the following behavior:
 1. If there is a context classloader on the thread, use that.
 2. Otherwise use the classloader that loaded Spark.
 We should just have a utility function for this and call that function 
 whenever we need to get a classloader.
 Note that SPARK-1403 is a workaround for this exact problem (it sets the 
 context class loader because downstream code assumes it is set). Once this 
 gets fixed in a more general way SPARK-1403 can be reverted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-03-23 Thread Littlestar (JIRA)
Littlestar created SPARK-6461:
-

 Summary: spark.executorEnv.PATH in spark-defaults.conf is not pass 
to mesos
 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar


I use mesos run spak 1.3.0 ./run-example SparkPi
but failed.

spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
spark.executorEnv.PATH
spark.executorEnv.HADOOP_HOME
spark.executorEnv.JAVA_HOME

E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs 
-copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
'/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375468#comment-14375468
 ] 

Littlestar edited comment on SPARK-6461 at 3/23/15 8:39 AM:


each mesos slave node has JAVA and HADOOP DataNode.

I also add  the following setting to mesos-master-env.sh and mesos-slave-env.sh.
 export MESOS_JAVA_HOME=/home/test/jdk
 export MESOS_HADOOP_HOME=/home/test/hadoop-2.4.0
export MESOS_HADOOP_CONF_DIR=/home/test/hadoop-2.4.0/etc/hadoop
 export 
MESOS_PATH=/home/test/jdk/bin:/home/test/hadoop-2.4.0/sbin:/home/test/hadoop-2.4.0/bin:/sbin:/bin:/usr/sbin:/usr/bin

 /usr/bin/env: bash: No such file or directory

thanks.



was (Author: cnstar9988):
each mesos slave node has JAVA and HADOOP DataNode.

I also add  the following setting to mesos-master-env.sh and mesos-slave-env.sh.
 export MESOS_JAVA_HOME=/home/test/jdk
 export MESOS_HADOOP_HOME=/home/test/hadoop-2.4.0
 export 
MESOS_PATH=/home/test/jdk/bin:/home/test/hadoop-2.4.0/sbin:/home/test/hadoop-2.4.0/bin:/sbin:/bin:/usr/sbin:/usr/bin

 /usr/bin/env: bash: No such file or directory

thanks.


 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 --

 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar

 I use mesos run spak 1.3.0 ./run-example SparkPi
 but failed.
 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 spark.executorEnv.PATH
 spark.executorEnv.HADOOP_HOME
 spark.executorEnv.JAVA_HOME
 E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375580#comment-14375580
 ] 

Littlestar edited comment on SPARK-6461 at 3/23/15 8:49 AM:


when I add MESOS_HADOOP_CONF_DIR at all mesos-master-env.sh and 
mesos-slave-env.sh , It throws the following error.
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/executor/MesosExecutorBackend
 Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.executor.MesosExecutorBackend

similar to https://github.com/apache/spark/pull/620


was (Author: cnstar9988):
when I add MESOS_HADOOP_CONF_DIR at all mesos-master-env.sh and 
mesos-slave-env.sh , It throws the following error.
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/executor/MesosExecutorBackend
 Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.executor.MesosExecutorBackend

 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 --

 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar

 I use mesos run spak 1.3.0 ./run-example SparkPi
 but failed.
 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 spark.executorEnv.PATH
 spark.executorEnv.HADOOP_HOME
 spark.executorEnv.JAVA_HOME
 E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1702) Mesos executor won't start because of a ClassNotFoundException

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375599#comment-14375599
 ] 

Littlestar commented on SPARK-1702:
---

I met this on spak 1.3.0 + mesos 0.21.1

 Mesos executor won't start because of a ClassNotFoundException
 --

 Key: SPARK-1702
 URL: https://issues.apache.org/jira/browse/SPARK-1702
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.0.0
Reporter: Bouke van der Bijl
  Labels: executors, mesos, spark

 Some discussion here: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html
 Fix here (which is probably not the right fix): 
 https://github.com/apache/spark/pull/620
 This was broken in v0.9.0, was fixed in v0.9.1 and is now broken again.
 Error in Mesos executor stderr:
 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0502 17:31:42.672224 14688 exec.cpp:131] Version: 0.18.0
 I0502 17:31:42.674959 14707 exec.cpp:205] Executor registered on slave 
 20140501-182306-16842879-5050-10155-0
 14/05/02 17:31:42 INFO MesosExecutorBackend: Using Spark's default log4j 
 profile: org/apache/spark/log4j-defaults.properties
 14/05/02 17:31:42 INFO MesosExecutorBackend: Registered with Mesos as 
 executor ID 20140501-182306-16842879-5050-10155-0
 14/05/02 17:31:43 INFO SecurityManager: Changing view acls to: vagrant
 14/05/02 17:31:43 INFO SecurityManager: SecurityManager, is authentication 
 enabled: false are ui acls enabled: false users with view permissions: 
 Set(vagrant)
 14/05/02 17:31:43 INFO Slf4jLogger: Slf4jLogger started
 14/05/02 17:31:43 INFO Remoting: Starting remoting
 14/05/02 17:31:43 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://spark@localhost:50843]
 14/05/02 17:31:43 INFO Remoting: Remoting now listens on addresses: 
 [akka.tcp://spark@localhost:50843]
 java.lang.ClassNotFoundException: org/apache/spark/serializer/JavaSerializer
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:165)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:176)
 at org.apache.spark.executor.Executor.init(Executor.scala:106)
 at 
 org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:56)
 Exception in thread Thread-0 I0502 17:31:43.710039 14707 exec.cpp:412] 
 Deactivating the executor libprocess
 The problem is that it can't find the class. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-1702) Mesos executor won't start because of a ClassNotFoundException

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375599#comment-14375599
 ] 

Littlestar edited comment on SPARK-1702 at 3/23/15 9:05 AM:


I met this on spak 1.3.0 + mesos 0.21.1 with run-example SparkPi


was (Author: cnstar9988):
I met this on spak 1.3.0 + mesos 0.21.1

 Mesos executor won't start because of a ClassNotFoundException
 --

 Key: SPARK-1702
 URL: https://issues.apache.org/jira/browse/SPARK-1702
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.0.0
Reporter: Bouke van der Bijl
  Labels: executors, mesos, spark

 Some discussion here: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html
 Fix here (which is probably not the right fix): 
 https://github.com/apache/spark/pull/620
 This was broken in v0.9.0, was fixed in v0.9.1 and is now broken again.
 Error in Mesos executor stderr:
 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0502 17:31:42.672224 14688 exec.cpp:131] Version: 0.18.0
 I0502 17:31:42.674959 14707 exec.cpp:205] Executor registered on slave 
 20140501-182306-16842879-5050-10155-0
 14/05/02 17:31:42 INFO MesosExecutorBackend: Using Spark's default log4j 
 profile: org/apache/spark/log4j-defaults.properties
 14/05/02 17:31:42 INFO MesosExecutorBackend: Registered with Mesos as 
 executor ID 20140501-182306-16842879-5050-10155-0
 14/05/02 17:31:43 INFO SecurityManager: Changing view acls to: vagrant
 14/05/02 17:31:43 INFO SecurityManager: SecurityManager, is authentication 
 enabled: false are ui acls enabled: false users with view permissions: 
 Set(vagrant)
 14/05/02 17:31:43 INFO Slf4jLogger: Slf4jLogger started
 14/05/02 17:31:43 INFO Remoting: Starting remoting
 14/05/02 17:31:43 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://spark@localhost:50843]
 14/05/02 17:31:43 INFO Remoting: Remoting now listens on addresses: 
 [akka.tcp://spark@localhost:50843]
 java.lang.ClassNotFoundException: org/apache/spark/serializer/JavaSerializer
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:165)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:176)
 at org.apache.spark.executor.Executor.init(Executor.scala:106)
 at 
 org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:56)
 Exception in thread Thread-0 I0502 17:31:43.710039 14707 exec.cpp:412] 
 Deactivating the executor libprocess
 The problem is that it can't find the class. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1480) Choose classloader consistently inside of Spark codebase

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375576#comment-14375576
 ] 

Littlestar commented on SPARK-1480:
---

same as https://issues.apache.org/jira/browse/SPARK-6461

 run-example SparkPi can reproduce this bug.

 Choose classloader consistently inside of Spark codebase
 

 Key: SPARK-1480
 URL: https://issues.apache.org/jira/browse/SPARK-1480
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0


 The Spark codebase is not always consistent on which class loader it uses 
 when classlaoders are explicitly passed to things like serializers. This 
 caused SPARK-1403 and also causes a bug where when the driver has a modified 
 context class loader it is not translated correctly in local mode to the 
 (local) executor.
 In most cases what we want is the following behavior:
 1. If there is a context classloader on the thread, use that.
 2. Otherwise use the classloader that loaded Spark.
 We should just have a utility function for this and call that function 
 whenever we need to get a classloader.
 Note that SPARK-1403 is a workaround for this exact problem (it sets the 
 context class loader because downstream code assumes it is set). Once this 
 gets fixed in a more general way SPARK-1403 can be reverted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375468#comment-14375468
 ] 

Littlestar commented on SPARK-6461:
---

each mesos slave node has JAVA and HADOOP DataNode.

I also add  the following setting to mesos-master-env.sh and mesos-slave-env.sh.
 export MESOS_JAVA_HOME=/home/test/jdk
 export MESOS_HADOOP_HOME=/home/test/hadoop-2.4.0
 export 
MESOS_PATH=/home/test/jdk/bin:/home/test/hadoop-2.4.0/sbin:/home/test/hadoop-2.4.0/bin:/sbin:/bin:/usr/sbin:/usr/bin

 /usr/bin/env: bash: No such file or directory

thanks.


 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 --

 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar

 I use mesos run spak 1.3.0 ./run-example SparkPi
 but failed.
 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 spark.executorEnv.PATH
 spark.executorEnv.HADOOP_HOME
 spark.executorEnv.JAVA_HOME
 E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1403) Spark on Mesos does not set Thread's context class loader

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375583#comment-14375583
 ] 

Littlestar commented on SPARK-1403:
---

I want to reopen this bug, because I can reproduce it at spark 1.3.0 + mesos 
0.21.1 with run-example SparkPi 


 Spark on Mesos does not set Thread's context class loader
 -

 Key: SPARK-1403
 URL: https://issues.apache.org/jira/browse/SPARK-1403
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
 Environment: ubuntu 12.04 on vagrant
Reporter: Bharath Bhushan
Priority: Blocker
 Fix For: 1.0.0


 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
 executor on mesos slave throws a  java.lang.ClassNotFoundException for 
 org.apache.spark.serializer.JavaSerializer.
 The lengthy discussion is here: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375580#comment-14375580
 ] 

Littlestar commented on SPARK-6461:
---

when I add MESOS_HADOOP_CONF_DIR at all mesos-master-env.sh and 
mesos-slave-env.sh , It throws the following error.
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/executor/MesosExecutorBackend
 Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.executor.MesosExecutorBackend

 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 --

 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar

 I use mesos run spak 1.3.0 ./run-example SparkPi
 but failed.
 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 spark.executorEnv.PATH
 spark.executorEnv.HADOOP_HOME
 spark.executorEnv.JAVA_HOME
 E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375468#comment-14375468
 ] 

Littlestar edited comment on SPARK-6461 at 3/23/15 9:29 AM:


each mesos slave node has JAVA and HADOOP DataNode.

Now I add  the following setting to mesos-master-env.sh and mesos-slave-env.sh.
 export MESOS_JAVA_HOME=/home/test/jdk
 export MESOS_HADOOP_HOME=/home/test/hadoop-2.4.0
export MESOS_HADOOP_CONF_DIR=/home/test/hadoop-2.4.0/etc/hadoop
 export 
MESOS_PATH=/home/test/jdk/bin:/home/test/hadoop-2.4.0/sbin:/home/test/hadoop-2.4.0/bin:/sbin:/bin:/usr/sbin:/usr/bin

 /usr/bin/env: bash: No such file or directory

thanks.



was (Author: cnstar9988):
each mesos slave node has JAVA and HADOOP DataNode.

I also add  the following setting to mesos-master-env.sh and mesos-slave-env.sh.
 export MESOS_JAVA_HOME=/home/test/jdk
 export MESOS_HADOOP_HOME=/home/test/hadoop-2.4.0
export MESOS_HADOOP_CONF_DIR=/home/test/hadoop-2.4.0/etc/hadoop
 export 
MESOS_PATH=/home/test/jdk/bin:/home/test/hadoop-2.4.0/sbin:/home/test/hadoop-2.4.0/bin:/sbin:/bin:/usr/sbin:/usr/bin

 /usr/bin/env: bash: No such file or directory

thanks.


 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 --

 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar

 I use mesos run spak 1.3.0 ./run-example SparkPi
 but failed.
 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 spark.executorEnv.PATH
 spark.executorEnv.HADOOP_HOME
 spark.executorEnv.JAVA_HOME
 E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375478#comment-14375478
 ] 

Littlestar commented on SPARK-6461:
---

in spark/bin, some shell script use usr/bin/env bash

I think  changed #!/usr/bin/env bash to #!/bin/bash and that worked.

 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 --

 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar

 I use mesos run spak 1.3.0 ./run-example SparkPi
 but failed.
 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 spark.executorEnv.PATH
 spark.executorEnv.HADOOP_HOME
 spark.executorEnv.JAVA_HOME
 E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375508#comment-14375508
 ] 

Littlestar commented on SPARK-6461:
---

http://spark.apache.org/docs/latest/running-on-mesos.html
I run ok with ./bin/spark-shell --master mesos://host:5050

but failed with run-example SparkPi

does spark 1.3.0 tested SparkPi on mesos, thanks

 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 --

 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar

 I use mesos run spak 1.3.0 ./run-example SparkPi
 but failed.
 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 spark.executorEnv.PATH
 spark.executorEnv.HADOOP_HOME
 spark.executorEnv.JAVA_HOME
 E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375612#comment-14375612
 ] 

Littlestar commented on SPARK-6213:
---

may be change protected[sql] def selectFilters(filters: Seq[Expression]) to 
static public java class is a easy way 

 sql.catalyst.expressions.Expression is not friendly to java
 ---

 Key: SPARK-6213
 URL: https://issues.apache.org/jira/browse/SPARK-6213
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor

 sql.sources.CatalystScan#
 public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression 
 filters)
 I use java to extends BaseRelation, but sql.catalyst.expressions.Expression 
 is not friendly to java, it's can't  iterated by java, such as NodeName, 
 NodeType, FuncName, FuncArgs.
 DataSourceStrategy.scala#selectFilters
 {noformat}
 /**
* Selects Catalyst predicate [[Expression]]s which are convertible into 
 data source [[Filter]]s,
* and convert them.
*/
   protected[sql] def selectFilters(filters: Seq[Expression]) = {
 def translate(predicate: Expression): Option[Filter] = predicate match {
   case expressions.EqualTo(a: Attribute, Literal(v, _)) =
 Some(sources.EqualTo(a.name, v))
   case expressions.EqualTo(Literal(v, _), a: Attribute) =
 Some(sources.EqualTo(a.name, v))
   case expressions.GreaterThan(a: Attribute, Literal(v, _)) =
 Some(sources.GreaterThan(a.name, v))
   case expressions.GreaterThan(Literal(v, _), a: Attribute) =
 Some(sources.LessThan(a.name, v))
   case expressions.LessThan(a: Attribute, Literal(v, _)) =
 Some(sources.LessThan(a.name, v))
   case expressions.LessThan(Literal(v, _), a: Attribute) =
 Some(sources.GreaterThan(a.name, v))
   case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) =
 Some(sources.GreaterThanOrEqual(a.name, v))
   case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) =
 Some(sources.LessThanOrEqual(a.name, v))
   case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) =
 Some(sources.LessThanOrEqual(a.name, v))
   case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) =
 Some(sources.GreaterThanOrEqual(a.name, v))
   case expressions.InSet(a: Attribute, set) =
 Some(sources.In(a.name, set.toArray))
   case expressions.IsNull(a: Attribute) =
 Some(sources.IsNull(a.name))
   case expressions.IsNotNull(a: Attribute) =
 Some(sources.IsNotNull(a.name))
   case expressions.And(left, right) =
 (translate(left) ++ translate(right)).reduceOption(sources.And)
   case expressions.Or(left, right) =
 for {
   leftFilter - translate(left)
   rightFilter - translate(right)
 } yield sources.Or(leftFilter, rightFilter)
   case expressions.Not(child) =
 translate(child).map(sources.Not)
   case _ = None
 }
 filters.flatMap(translate)
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos

2015-03-23 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-6461:
--
Comment: was deleted

(was: in spark/bin, some shell script use usr/bin/env bash

I think  changed #!/usr/bin/env bash to #!/bin/bash and that worked.)

 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 --

 Key: SPARK-6461
 URL: https://issues.apache.org/jira/browse/SPARK-6461
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Littlestar

 I use mesos run spak 1.3.0 ./run-example SparkPi
 but failed.
 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
 spark.executorEnv.PATH
 spark.executorEnv.HADOOP_HOME
 spark.executorEnv.JAVA_HOME
 E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6240) Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may outOfMemory for Large Sets

2015-03-10 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354890#comment-14354890
 ] 

Littlestar commented on SPARK-6240:
---

ok, I kown, Thanks.
I  just only notice that Array[Item] is not a distributed structure.

 Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may outOfMemory for 
 Large Sets
 

 Key: SPARK-6240
 URL: https://issues.apache.org/jira/browse/SPARK-6240
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor

 Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may outOfMemory for 
 Large Sets
 {noformat}
   private def genFreqItems[Item: ClassTag](
   data: RDD[Array[Item]],
   minCount: Long,
   partitioner: Partitioner): Array[Item] = {
 data.flatMap { t =
   val uniq = t.toSet
   if (t.size != uniq.size) {
 throw new SparkException(sItems in a transaction must be unique but 
 got ${t.toSeq}.)
   }
   t
 }.map(v = (v, 1L))
   .reduceByKey(partitioner, _ + _)
   .filter(_._2 = minCount)
   .collect()
   .sortBy(-_._2)
   .map(_._1)
   }
 {noformat}
 I use 10*1*1 records for test, for output all simultaneously pair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use long instead

2015-03-10 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354829#comment-14354829
 ] 

Littlestar commented on SPARK-6239:
---

When use FPGrowthModel, the numbers of input records is unkown before read.
I think change the meaning of minSupport, or add setMinCount 

If I want to set minCount=2, I must use.setMinSupport(1.99/(rdd.count())), 
because of double's precision.
val minCount = math.ceil(minSupport * count).toLong
 math.ceil

 Spark MLlib fpm#FPGrowth minSupport should use long instead
 ---

 Key: SPARK-6239
 URL: https://issues.apache.org/jira/browse/SPARK-6239
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor

 Spark MLlib fpm#FPGrowth minSupport should use long instead
 ==
 val minCount = math.ceil(minSupport * count).toLong
 because:
 1. [count]numbers of datasets is not kown before read.
 2. [minSupport ]double precision.
 from mahout#FPGrowthDriver.java
 addOption(minSupport, s, (Optional) The minimum number of times a 
 co-occurrence must be present.
   +  Default Value: 3, 3);
 I just want to set minCount=2 for test.
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use long instead

2015-03-10 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-6239:
--
Description: 
Spark MLlib fpm#FPGrowth minSupport should use long instead

==
val minCount = math.ceil(minSupport * count).toLong
because:
1. [count]numbers of datasets is not kown before read.
2. [minSupport ]double precision.

from mahout#FPGrowthDriver.java
addOption(minSupport, s, (Optional) The minimum number of times a 
co-occurrence must be present.
  +  Default Value: 3, 3);


I just want to set minCount=2 for test.
Thanks.

  was:
Spark MLlib fpm#FPGrowth minSupport should use double instead

==
val minCount = math.ceil(minSupport * count).toLong
because:
1. [count]numbers of datasets is not kown before read.
2. [minSupport ]double precision.

from mahout#FPGrowthDriver.java
addOption(minSupport, s, (Optional) The minimum number of times a 
co-occurrence must be present.
  +  Default Value: 3, 3);


I just want to set minCount=2 for test.
Thanks.

Summary: Spark MLlib fpm#FPGrowth minSupport should use long instead  
(was: Spark MLlib fpm#FPGrowth minSupport should use double instead)

 Spark MLlib fpm#FPGrowth minSupport should use long instead
 ---

 Key: SPARK-6239
 URL: https://issues.apache.org/jira/browse/SPARK-6239
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor

 Spark MLlib fpm#FPGrowth minSupport should use long instead
 ==
 val minCount = math.ceil(minSupport * count).toLong
 because:
 1. [count]numbers of datasets is not kown before read.
 2. [minSupport ]double precision.
 from mahout#FPGrowthDriver.java
 addOption(minSupport, s, (Optional) The minimum number of times a 
 co-occurrence must be present.
   +  Default Value: 3, 3);
 I just want to set minCount=2 for test.
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6240) Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may outOfMemory for Large Sets

2015-03-10 Thread Littlestar (JIRA)
Littlestar created SPARK-6240:
-

 Summary: Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may 
outOfMemory for Large Sets
 Key: SPARK-6240
 URL: https://issues.apache.org/jira/browse/SPARK-6240
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor


Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may outOfMemory for Large 
Sets

{noformat}
  private def genFreqItems[Item: ClassTag](
  data: RDD[Array[Item]],
  minCount: Long,
  partitioner: Partitioner): Array[Item] = {
data.flatMap { t =
  val uniq = t.toSet
  if (t.size != uniq.size) {
throw new SparkException(sItems in a transaction must be unique but 
got ${t.toSeq}.)
  }
  t
}.map(v = (v, 1L))
  .reduceByKey(partitioner, _ + _)
  .filter(_._2 = minCount)
  .collect()
  .sortBy(-_._2)
  .map(_._1)
  }
{noformat}

I use 10*1*1 records for test, for output all simultaneously pair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use long instead

2015-03-10 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354450#comment-14354450
 ] 

Littlestar commented on SPARK-6239:
---

I just want to set minCount=2 for test 10*1*1 records.

FPGrowthModelString model = new FPGrowth()
  .setMinSupport(2/(10*1.0*1.0))
  .setNumPartitions(500)
  .run(maps);

I think use minCount is better than minSupport.

 Spark MLlib fpm#FPGrowth minSupport should use long instead
 ---

 Key: SPARK-6239
 URL: https://issues.apache.org/jira/browse/SPARK-6239
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor

 Spark MLlib fpm#FPGrowth minSupport should use long instead
 ==
 val minCount = math.ceil(minSupport * count).toLong
 because:
 1. [count]numbers of datasets is not kown before read.
 2. [minSupport ]double precision.
 from mahout#FPGrowthDriver.java
 addOption(minSupport, s, (Optional) The minimum number of times a 
 co-occurrence must be present.
   +  Default Value: 3, 3);
 I just want to set minCount=2 for test.
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use double instead

2015-03-10 Thread Littlestar (JIRA)
Littlestar created SPARK-6239:
-

 Summary: Spark MLlib fpm#FPGrowth minSupport should use double 
instead
 Key: SPARK-6239
 URL: https://issues.apache.org/jira/browse/SPARK-6239
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor


Spark MLlib fpm#FPGrowth minSupport should use double instead

==
val minCount = math.ceil(minSupport * count).toLong
because:
1. [count]numbers of datasets is not kown before read.
2. [minSupport ]double precision.

from mahout#FPGrowthDriver.java
addOption(minSupport, s, (Optional) The minimum number of times a 
co-occurrence must be present.
  +  Default Value: 3, 3);


I just want to set minCount=2 for test.
Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java

2015-03-07 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351479#comment-14351479
 ] 

Littlestar commented on SPARK-6213:
---

above similar scala code is very hard to translate to java.

 sql.catalyst.expressions.Expression is not friendly to java
 ---

 Key: SPARK-6213
 URL: https://issues.apache.org/jira/browse/SPARK-6213
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor

 sql.sources.CatalystScan#
 public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression 
 filters)
 I use java to extends BaseRelation, but sql.catalyst.expressions.Expression 
 is not friendly to java, it's can't  iterated by java, such as NodeName, 
 NodeType.
 DataSourceStrategy.scala#selectFilters
 {noformat}
 /**
* Selects Catalyst predicate [[Expression]]s which are convertible into 
 data source [[Filter]]s,
* and convert them.
*/
   protected[sql] def selectFilters(filters: Seq[Expression]) = {
 def translate(predicate: Expression): Option[Filter] = predicate match {
   case expressions.EqualTo(a: Attribute, Literal(v, _)) =
 Some(sources.EqualTo(a.name, v))
   case expressions.EqualTo(Literal(v, _), a: Attribute) =
 Some(sources.EqualTo(a.name, v))
   case expressions.GreaterThan(a: Attribute, Literal(v, _)) =
 Some(sources.GreaterThan(a.name, v))
   case expressions.GreaterThan(Literal(v, _), a: Attribute) =
 Some(sources.LessThan(a.name, v))
   case expressions.LessThan(a: Attribute, Literal(v, _)) =
 Some(sources.LessThan(a.name, v))
   case expressions.LessThan(Literal(v, _), a: Attribute) =
 Some(sources.GreaterThan(a.name, v))
   case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) =
 Some(sources.GreaterThanOrEqual(a.name, v))
   case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) =
 Some(sources.LessThanOrEqual(a.name, v))
   case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) =
 Some(sources.LessThanOrEqual(a.name, v))
   case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) =
 Some(sources.GreaterThanOrEqual(a.name, v))
   case expressions.InSet(a: Attribute, set) =
 Some(sources.In(a.name, set.toArray))
   case expressions.IsNull(a: Attribute) =
 Some(sources.IsNull(a.name))
   case expressions.IsNotNull(a: Attribute) =
 Some(sources.IsNotNull(a.name))
   case expressions.And(left, right) =
 (translate(left) ++ translate(right)).reduceOption(sources.And)
   case expressions.Or(left, right) =
 for {
   leftFilter - translate(left)
   rightFilter - translate(right)
 } yield sources.Or(leftFilter, rightFilter)
   case expressions.Not(child) =
 translate(child).map(sources.Not)
   case _ = None
 }
 filters.flatMap(translate)
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java

2015-03-07 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-6213:
--
Description: 
sql.sources.CatalystScan#
public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression 
filters)

I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is 
not friendly to java, it's can't  iterated by java, such as NodeName, NodeType, 
FunctionName, FuncArgs.

DataSourceStrategy.scala#selectFilters
{noformat}

/**
   * Selects Catalyst predicate [[Expression]]s which are convertible into data 
source [[Filter]]s,
   * and convert them.
   */
  protected[sql] def selectFilters(filters: Seq[Expression]) = {
def translate(predicate: Expression): Option[Filter] = predicate match {
  case expressions.EqualTo(a: Attribute, Literal(v, _)) =
Some(sources.EqualTo(a.name, v))
  case expressions.EqualTo(Literal(v, _), a: Attribute) =
Some(sources.EqualTo(a.name, v))

  case expressions.GreaterThan(a: Attribute, Literal(v, _)) =
Some(sources.GreaterThan(a.name, v))
  case expressions.GreaterThan(Literal(v, _), a: Attribute) =
Some(sources.LessThan(a.name, v))

  case expressions.LessThan(a: Attribute, Literal(v, _)) =
Some(sources.LessThan(a.name, v))
  case expressions.LessThan(Literal(v, _), a: Attribute) =
Some(sources.GreaterThan(a.name, v))

  case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) =
Some(sources.GreaterThanOrEqual(a.name, v))
  case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) =
Some(sources.LessThanOrEqual(a.name, v))

  case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) =
Some(sources.LessThanOrEqual(a.name, v))
  case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) =
Some(sources.GreaterThanOrEqual(a.name, v))

  case expressions.InSet(a: Attribute, set) =
Some(sources.In(a.name, set.toArray))

  case expressions.IsNull(a: Attribute) =
Some(sources.IsNull(a.name))
  case expressions.IsNotNull(a: Attribute) =
Some(sources.IsNotNull(a.name))

  case expressions.And(left, right) =
(translate(left) ++ translate(right)).reduceOption(sources.And)

  case expressions.Or(left, right) =
for {
  leftFilter - translate(left)
  rightFilter - translate(right)
} yield sources.Or(leftFilter, rightFilter)

  case expressions.Not(child) =
translate(child).map(sources.Not)

  case _ = None
}

filters.flatMap(translate)
  }

{noformat}


  was:
sql.sources.CatalystScan#
public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression 
filters)

I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is 
not friendly to java, it's can't  iterated by java, such as NodeName, 
NodeType.

DataSourceStrategy.scala#selectFilters
{noformat}

/**
   * Selects Catalyst predicate [[Expression]]s which are convertible into data 
source [[Filter]]s,
   * and convert them.
   */
  protected[sql] def selectFilters(filters: Seq[Expression]) = {
def translate(predicate: Expression): Option[Filter] = predicate match {
  case expressions.EqualTo(a: Attribute, Literal(v, _)) =
Some(sources.EqualTo(a.name, v))
  case expressions.EqualTo(Literal(v, _), a: Attribute) =
Some(sources.EqualTo(a.name, v))

  case expressions.GreaterThan(a: Attribute, Literal(v, _)) =
Some(sources.GreaterThan(a.name, v))
  case expressions.GreaterThan(Literal(v, _), a: Attribute) =
Some(sources.LessThan(a.name, v))

  case expressions.LessThan(a: Attribute, Literal(v, _)) =
Some(sources.LessThan(a.name, v))
  case expressions.LessThan(Literal(v, _), a: Attribute) =
Some(sources.GreaterThan(a.name, v))

  case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) =
Some(sources.GreaterThanOrEqual(a.name, v))
  case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) =
Some(sources.LessThanOrEqual(a.name, v))

  case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) =
Some(sources.LessThanOrEqual(a.name, v))
  case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) =
Some(sources.GreaterThanOrEqual(a.name, v))

  case expressions.InSet(a: Attribute, set) =
Some(sources.In(a.name, set.toArray))

  case expressions.IsNull(a: Attribute) =
Some(sources.IsNull(a.name))
  case expressions.IsNotNull(a: Attribute) =
Some(sources.IsNotNull(a.name))

  case expressions.And(left, right) =
(translate(left) ++ translate(right)).reduceOption(sources.And)

  case expressions.Or(left, right) =
for {
  leftFilter - translate(left)
  rightFilter - translate(right)
} yield sources.Or(leftFilter, rightFilter)

 

[jira] [Updated] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java

2015-03-07 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-6213:
--
Description: 
sql.sources.CatalystScan#
public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression 
filters)

I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is 
not friendly to java, it's can't  iterated by java, such as NodeName, NodeType, 
FuncName, FuncArgs.

DataSourceStrategy.scala#selectFilters
{noformat}

/**
   * Selects Catalyst predicate [[Expression]]s which are convertible into data 
source [[Filter]]s,
   * and convert them.
   */
  protected[sql] def selectFilters(filters: Seq[Expression]) = {
def translate(predicate: Expression): Option[Filter] = predicate match {
  case expressions.EqualTo(a: Attribute, Literal(v, _)) =
Some(sources.EqualTo(a.name, v))
  case expressions.EqualTo(Literal(v, _), a: Attribute) =
Some(sources.EqualTo(a.name, v))

  case expressions.GreaterThan(a: Attribute, Literal(v, _)) =
Some(sources.GreaterThan(a.name, v))
  case expressions.GreaterThan(Literal(v, _), a: Attribute) =
Some(sources.LessThan(a.name, v))

  case expressions.LessThan(a: Attribute, Literal(v, _)) =
Some(sources.LessThan(a.name, v))
  case expressions.LessThan(Literal(v, _), a: Attribute) =
Some(sources.GreaterThan(a.name, v))

  case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) =
Some(sources.GreaterThanOrEqual(a.name, v))
  case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) =
Some(sources.LessThanOrEqual(a.name, v))

  case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) =
Some(sources.LessThanOrEqual(a.name, v))
  case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) =
Some(sources.GreaterThanOrEqual(a.name, v))

  case expressions.InSet(a: Attribute, set) =
Some(sources.In(a.name, set.toArray))

  case expressions.IsNull(a: Attribute) =
Some(sources.IsNull(a.name))
  case expressions.IsNotNull(a: Attribute) =
Some(sources.IsNotNull(a.name))

  case expressions.And(left, right) =
(translate(left) ++ translate(right)).reduceOption(sources.And)

  case expressions.Or(left, right) =
for {
  leftFilter - translate(left)
  rightFilter - translate(right)
} yield sources.Or(leftFilter, rightFilter)

  case expressions.Not(child) =
translate(child).map(sources.Not)

  case _ = None
}

filters.flatMap(translate)
  }

{noformat}


  was:
sql.sources.CatalystScan#
public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression 
filters)

I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is 
not friendly to java, it's can't  iterated by java, such as NodeName, NodeType, 
FunctionName, FuncArgs.

DataSourceStrategy.scala#selectFilters
{noformat}

/**
   * Selects Catalyst predicate [[Expression]]s which are convertible into data 
source [[Filter]]s,
   * and convert them.
   */
  protected[sql] def selectFilters(filters: Seq[Expression]) = {
def translate(predicate: Expression): Option[Filter] = predicate match {
  case expressions.EqualTo(a: Attribute, Literal(v, _)) =
Some(sources.EqualTo(a.name, v))
  case expressions.EqualTo(Literal(v, _), a: Attribute) =
Some(sources.EqualTo(a.name, v))

  case expressions.GreaterThan(a: Attribute, Literal(v, _)) =
Some(sources.GreaterThan(a.name, v))
  case expressions.GreaterThan(Literal(v, _), a: Attribute) =
Some(sources.LessThan(a.name, v))

  case expressions.LessThan(a: Attribute, Literal(v, _)) =
Some(sources.LessThan(a.name, v))
  case expressions.LessThan(Literal(v, _), a: Attribute) =
Some(sources.GreaterThan(a.name, v))

  case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) =
Some(sources.GreaterThanOrEqual(a.name, v))
  case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) =
Some(sources.LessThanOrEqual(a.name, v))

  case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) =
Some(sources.LessThanOrEqual(a.name, v))
  case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) =
Some(sources.GreaterThanOrEqual(a.name, v))

  case expressions.InSet(a: Attribute, set) =
Some(sources.In(a.name, set.toArray))

  case expressions.IsNull(a: Attribute) =
Some(sources.IsNull(a.name))
  case expressions.IsNotNull(a: Attribute) =
Some(sources.IsNotNull(a.name))

  case expressions.And(left, right) =
(translate(left) ++ translate(right)).reduceOption(sources.And)

  case expressions.Or(left, right) =
for {
  leftFilter - translate(left)
  rightFilter - translate(right)
} yield 

[jira] [Created] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java

2015-03-06 Thread Littlestar (JIRA)
Littlestar created SPARK-6213:
-

 Summary: sql.catalyst.expressions.Expression is not friendly to 
java
 Key: SPARK-6213
 URL: https://issues.apache.org/jira/browse/SPARK-6213
 Project: Spark
  Issue Type: Improvement
Reporter: Littlestar
Priority: Minor


sql.sources.CatalystScan#
public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression 
filters)

I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is 
not friendly to java, it's can't  iterated by java, such as NodeName, 
NodeType.

DataSourceStrategy.scala#selectFilters
{noformat}

/**
   * Selects Catalyst predicate [[Expression]]s which are convertible into data 
source [[Filter]]s,
   * and convert them.
   */
  protected[sql] def selectFilters(filters: Seq[Expression]) = {
def translate(predicate: Expression): Option[Filter] = predicate match {
  case expressions.EqualTo(a: Attribute, Literal(v, _)) =
Some(sources.EqualTo(a.name, v))
  case expressions.EqualTo(Literal(v, _), a: Attribute) =
Some(sources.EqualTo(a.name, v))

  case expressions.GreaterThan(a: Attribute, Literal(v, _)) =
Some(sources.GreaterThan(a.name, v))
  case expressions.GreaterThan(Literal(v, _), a: Attribute) =
Some(sources.LessThan(a.name, v))

  case expressions.LessThan(a: Attribute, Literal(v, _)) =
Some(sources.LessThan(a.name, v))
  case expressions.LessThan(Literal(v, _), a: Attribute) =
Some(sources.GreaterThan(a.name, v))

  case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) =
Some(sources.GreaterThanOrEqual(a.name, v))
  case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) =
Some(sources.LessThanOrEqual(a.name, v))

  case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) =
Some(sources.LessThanOrEqual(a.name, v))
  case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) =
Some(sources.GreaterThanOrEqual(a.name, v))

  case expressions.InSet(a: Attribute, set) =
Some(sources.In(a.name, set.toArray))

  case expressions.IsNull(a: Attribute) =
Some(sources.IsNull(a.name))
  case expressions.IsNotNull(a: Attribute) =
Some(sources.IsNotNull(a.name))

  case expressions.And(left, right) =
(translate(left) ++ translate(right)).reduceOption(sources.And)

  case expressions.Or(left, right) =
for {
  leftFilter - translate(left)
  rightFilter - translate(right)
} yield sources.Or(leftFilter, rightFilter)

  case expressions.Not(child) =
translate(child).map(sources.Not)

  case _ = None
}

filters.flatMap(translate)
  }

{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java

2015-03-06 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-6213:
--
  Component/s: SQL
Affects Version/s: 1.3.0

 sql.catalyst.expressions.Expression is not friendly to java
 ---

 Key: SPARK-6213
 URL: https://issues.apache.org/jira/browse/SPARK-6213
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor

 sql.sources.CatalystScan#
 public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression 
 filters)
 I use java to extends BaseRelation, but sql.catalyst.expressions.Expression 
 is not friendly to java, it's can't  iterated by java, such as NodeName, 
 NodeType.
 DataSourceStrategy.scala#selectFilters
 {noformat}
 /**
* Selects Catalyst predicate [[Expression]]s which are convertible into 
 data source [[Filter]]s,
* and convert them.
*/
   protected[sql] def selectFilters(filters: Seq[Expression]) = {
 def translate(predicate: Expression): Option[Filter] = predicate match {
   case expressions.EqualTo(a: Attribute, Literal(v, _)) =
 Some(sources.EqualTo(a.name, v))
   case expressions.EqualTo(Literal(v, _), a: Attribute) =
 Some(sources.EqualTo(a.name, v))
   case expressions.GreaterThan(a: Attribute, Literal(v, _)) =
 Some(sources.GreaterThan(a.name, v))
   case expressions.GreaterThan(Literal(v, _), a: Attribute) =
 Some(sources.LessThan(a.name, v))
   case expressions.LessThan(a: Attribute, Literal(v, _)) =
 Some(sources.LessThan(a.name, v))
   case expressions.LessThan(Literal(v, _), a: Attribute) =
 Some(sources.GreaterThan(a.name, v))
   case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) =
 Some(sources.GreaterThanOrEqual(a.name, v))
   case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) =
 Some(sources.LessThanOrEqual(a.name, v))
   case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) =
 Some(sources.LessThanOrEqual(a.name, v))
   case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) =
 Some(sources.GreaterThanOrEqual(a.name, v))
   case expressions.InSet(a: Attribute, set) =
 Some(sources.In(a.name, set.toArray))
   case expressions.IsNull(a: Attribute) =
 Some(sources.IsNull(a.name))
   case expressions.IsNotNull(a: Attribute) =
 Some(sources.IsNotNull(a.name))
   case expressions.And(left, right) =
 (translate(left) ++ translate(right)).reduceOption(sources.And)
   case expressions.Or(left, right) =
 for {
   leftFilter - translate(left)
   rightFilter - translate(right)
 } yield sources.Or(leftFilter, rightFilter)
   case expressions.Not(child) =
 translate(child).map(sources.Not)
   case _ = None
 }
 filters.flatMap(translate)
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods

2015-03-03 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344813#comment-14344813
 ] 

Littlestar commented on SPARK-6049:
---

I test 1.3.0 RC1, it fixed SPARK-3675 and SPARK-4865.
It works for me.

 HiveThriftServer2 may expose Inheritable methods
 

 Key: SPARK-6049
 URL: https://issues.apache.org/jira/browse/SPARK-6049
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
  Labels: HiveThriftServer2

 Does HiveThriftServer2 may expose Inheritable methods?
 HiveThriftServer2  is very good when used as a JDBC Server, but 
 HiveThriftServer2.scala  is not Inheritable or invokable by app.
 My app use JavaSQLContext and registerTempTable.
 I want to expose these TempTables by HiveThriftServer2(JDBC Server).
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old

2015-03-03 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346349#comment-14346349
 ] 

Littlestar commented on SPARK-5834:
---

Spark 1.3.0 RC1/RC2 bundle with httpclient 4.3.6, works ok.

 spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
 --

 Key: SPARK-5834
 URL: https://issues.apache.org/jira/browse/SPARK-5834
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor

  I see 
 spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
  It indicates that officical package only use httpclient 4.1.2.
 some spark module requires httpclient 4.2 and above.
 https://github.com/apache/spark/pull/2489/files ( 
 commons.httpclient.version4.2/commons.httpclient.version)
 https://github.com/apache/spark/pull/2535/files 
 (commons.httpclient.version4.2.6/commons.httpclient.version)
 I think httpclient 4.1.2 is too old, standard distribution may conflict with 
 other httpclient required user app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6151) schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size

2015-03-03 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-6151:
--
Component/s: SQL

 schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size
 ---

 Key: SPARK-6151
 URL: https://issues.apache.org/jira/browse/SPARK-6151
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Trivial

 How schemaRDD to parquetfile with saveAsParquetFile control the HDFS block 
 size. may be Configuration need.
 related question by others.
 http://apache-spark-user-list.1001560.n3.nabble.com/HDFS-block-size-for-parquet-output-tt21183.html
 http://qnalist.com/questions/5054892/spark-sql-parquet-and-impala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6151) schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size

2015-03-03 Thread Littlestar (JIRA)
Littlestar created SPARK-6151:
-

 Summary: schemaRDD to parquetfile with saveAsParquetFile control 
the HDFS block size
 Key: SPARK-6151
 URL: https://issues.apache.org/jira/browse/SPARK-6151
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Trivial


How schemaRDD to parquetfile with saveAsParquetFile control the HDFS block 
size. may be Configuration need.

related question by others.
http://apache-spark-user-list.1001560.n3.nabble.com/HDFS-block-size-for-parquet-output-tt21183.html
http://qnalist.com/questions/5054892/spark-sql-parquet-and-impala




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods

2015-03-03 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340018#comment-14340018
 ] 

Littlestar edited comment on SPARK-6049 at 3/3/15 9:59 AM:
---

thanks. I call ok by static methods, but not tables finds.
Does I need to open new JIRA issue?

{noformat}
SparkConf sparkConf = new SparkConf().setAppName(ParquetFileQuery);
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
SQLContext sqlCtx = new SQLContext(ctx);
HiveContext hiveCtx = new HiveContext(ctx.sc());
JavaSchemaRDD mids = sqlCtx.parquetFile(/parquetdir);
hiveCtx.registerDataFrameAsTable(mids, demo);

HiveThriftServer2.startWithContext(hiveCtx);
{noformat}


was (Author: cnstar9988):
thanks. I call ok by static methods, but not tables finds.
Does I need to open new JIRA issue?

{noformat}
SparkConf sparkConf = new 
SparkConf().setAppName(ParquetFileQuery).set(spark.executor.memory, 4g);
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
JavaSQLContext sqlCtx = new JavaSQLContext(ctx);
JavaSchemaRDD mids = sqlCtx.parquetFile(/parquetdir);
mids.registerTempTable(demo);

HiveThriftServer2.startWithContext(new HiveContext(ctx.sc()));
{noformat}

 HiveThriftServer2 may expose Inheritable methods
 

 Key: SPARK-6049
 URL: https://issues.apache.org/jira/browse/SPARK-6049
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
  Labels: HiveThriftServer2

 Does HiveThriftServer2 may expose Inheritable methods?
 HiveThriftServer2  is very good when used as a JDBC Server, but 
 HiveThriftServer2.scala  is not Inheritable or invokable by app.
 My app use JavaSQLContext and registerTempTable.
 I want to expose these TempTables by HiveThriftServer2(JDBC Server).
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java

2015-03-02 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-6131:
--
Description: 
I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html

I download 1.3.0(RC1) from 
http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz

Just only for test, build myself, but thers is no 
org/apache/spark/sql/api/java/JavaSQLContext found.

WARN:
no JavaSQLContext.scala/java in spark-1.3.0.tgz.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -

Maybe some files missed in 1.3.0 RC1 source.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\
spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\***

Thanks.


  was:
I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html

I download 1.3.0(RC1) from 
http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz

Just only for test, build myself, but thers is no 
org/apache/spark/sql/api/java/JavaSQLContext found.

WARN:
no JavaSQLContext.scala/java in spark-1.3.0.tgz.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -

Maybe some files missed in 1.3.0 RC1 source.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\

Thanks.



 Spark 1.3.0 (RC1) missing some source files in sql.api.java
 ---

 Key: SPARK-6131
 URL: https://issues.apache.org/jira/browse/SPARK-6131
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Critical

 I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html
 I download 1.3.0(RC1) from 
 http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz
 Just only for test, build myself, but thers is no 
 org/apache/spark/sql/api/java/JavaSQLContext found.
 WARN:
 no JavaSQLContext.scala/java in spark-1.3.0.tgz.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -
 Maybe some files missed in 1.3.0 RC1 source.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\
 spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\***
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java

2015-03-02 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-6131:
--
Comment: was deleted

(was: spark-1.2.1 is ok.
spark-1.2.1-src.jar\sql\core\src\main\scala\org\apache\spark\sql\api\JavaSQLContext.scala)

 Spark 1.3.0 (RC1) missing some source files in sql.api.java
 ---

 Key: SPARK-6131
 URL: https://issues.apache.org/jira/browse/SPARK-6131
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor

 I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html
 I download 1.3.0(RC1) from 
 http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz
 Just only for test, build myself, but thers is no 
 org/apache/spark/sql/api/java/JavaSQLContext found.
 WARN:
 no JavaSQLContext.scala/java in spark-1.3.0.tgz.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -
 Maybe some files missed in 1.3.0 RC1 source.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\
 spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\***
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java

2015-03-02 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344584#comment-14344584
 ] 

Littlestar commented on SPARK-6131:
---

spark-1.2.1 is ok.
spark-1.2.1-src.jar\sql\core\src\main\scala\org\apache\spark\sql\api\JavaSQLContext.scala

 Spark 1.3.0 (RC1) missing some source files in sql.api.java
 ---

 Key: SPARK-6131
 URL: https://issues.apache.org/jira/browse/SPARK-6131
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Critical

 I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html
 I download 1.3.0(RC1) from 
 http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz
 Just only for test, build myself, but thers is no 
 org/apache/spark/sql/api/java/JavaSQLContext found.
 WARN:
 no JavaSQLContext.scala/java in spark-1.3.0.tgz.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -
 Maybe some files missed in 1.3.0 RC1 source.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\
 spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\***
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java

2015-03-02 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-6131:
--
Priority: Minor  (was: Critical)

 Spark 1.3.0 (RC1) missing some source files in sql.api.java
 ---

 Key: SPARK-6131
 URL: https://issues.apache.org/jira/browse/SPARK-6131
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Minor

 I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html
 I download 1.3.0(RC1) from 
 http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz
 Just only for test, build myself, but thers is no 
 org/apache/spark/sql/api/java/JavaSQLContext found.
 WARN:
 no JavaSQLContext.scala/java in spark-1.3.0.tgz.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -
 Maybe some files missed in 1.3.0 RC1 source.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\
 spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\***
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java

2015-03-02 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344604#comment-14344604
 ] 

Littlestar commented on SPARK-6131:
---

spark 1.2.1: 
sql\core\src\main\scala\org\apache\spark\sql\api\java\JavaSQLContext.scala
spark 1.3.0 RC1 has no JavaSQLContext.scala.


 Spark 1.3.0 (RC1) missing some source files in sql.api.java
 ---

 Key: SPARK-6131
 URL: https://issues.apache.org/jira/browse/SPARK-6131
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Critical

 I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html
 I download 1.3.0(RC1) from 
 http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz
 Just only for test, build myself, but thers is no 
 org/apache/spark/sql/api/java/JavaSQLContext found.
 WARN:
 no JavaSQLContext.scala/java in spark-1.3.0.tgz.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -
 Maybe some files missed in 1.3.0 RC1 source.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\
 spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\***
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java

2015-03-02 Thread Littlestar (JIRA)
Littlestar created SPARK-6131:
-

 Summary: Spark 1.3.0 (RC1) missing some source files in 
sql.api.java
 Key: SPARK-6131
 URL: https://issues.apache.org/jira/browse/SPARK-6131
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Critical


I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
I download 1.3.0(RC1) from 
http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz

and build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext 
found.

WARN:
but JavaSQLContext.class appeared in spark-1.3.0-bin-hadoop2.4.tgz.

I checked that spark-1.3.0 has no JavaSQLContext.scala/java in spark-1.3.0.tgz.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -

Maybe some files missed in 1.3.0 RC1 source.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\

Thanks.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java

2015-03-02 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-6131:
--
Description: 
I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html

I download 1.3.0(RC1) from 
http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz

Just only for test, build myself, but thers is no 
org/apache/spark/sql/api/java/JavaSQLContext found.

WARN:
JavaSQLContext.class appeared in spark-1.3.0-bin-hadoop2.4.tgz.

But has no JavaSQLContext.scala/java in spark-1.3.0.tgz.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -

Maybe some files missed in 1.3.0 RC1 source.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\

Thanks.


  was:
I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
I download 1.3.0(RC1) from 
http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz

and build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext 
found.

WARN:
but JavaSQLContext.class appeared in spark-1.3.0-bin-hadoop2.4.tgz.

I checked that spark-1.3.0 has no JavaSQLContext.scala/java in spark-1.3.0.tgz.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -

Maybe some files missed in 1.3.0 RC1 source.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\

Thanks.



 Spark 1.3.0 (RC1) missing some source files in sql.api.java
 ---

 Key: SPARK-6131
 URL: https://issues.apache.org/jira/browse/SPARK-6131
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Critical

 I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html
 I download 1.3.0(RC1) from 
 http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz
 Just only for test, build myself, but thers is no 
 org/apache/spark/sql/api/java/JavaSQLContext found.
 WARN:
 JavaSQLContext.class appeared in spark-1.3.0-bin-hadoop2.4.tgz.
 But has no JavaSQLContext.scala/java in spark-1.3.0.tgz.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -
 Maybe some files missed in 1.3.0 RC1 source.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java

2015-03-02 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-6131:
--
Description: 
I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html

I download 1.3.0(RC1) from 
http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz

Just only for test, build myself, but thers is no 
org/apache/spark/sql/api/java/JavaSQLContext found.

WARN:
no JavaSQLContext.scala/java in spark-1.3.0.tgz.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -

Maybe some files missed in 1.3.0 RC1 source.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\

Thanks.


  was:
I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html

I download 1.3.0(RC1) from 
http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz

Just only for test, build myself, but thers is no 
org/apache/spark/sql/api/java/JavaSQLContext found.

WARN:
JavaSQLContext.class appeared in spark-1.3.0-bin-hadoop2.4.tgz.

But has no JavaSQLContext.scala/java in spark-1.3.0.tgz.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -

Maybe some files missed in 1.3.0 RC1 source.
spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\

Thanks.



 Spark 1.3.0 (RC1) missing some source files in sql.api.java
 ---

 Key: SPARK-6131
 URL: https://issues.apache.org/jira/browse/SPARK-6131
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Littlestar
Priority: Critical

 I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) 
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html
 I download 1.3.0(RC1) from 
 http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz
 Just only for test, build myself, but thers is no 
 org/apache/spark/sql/api/java/JavaSQLContext found.
 WARN:
 no JavaSQLContext.scala/java in spark-1.3.0.tgz.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java -
 Maybe some files missed in 1.3.0 RC1 source.
 spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods

2015-02-27 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar resolved SPARK-6049.
---
Resolution: Invalid

very sorry, HiveThriftServer2.startWithContext can be called by java.
I will open another JIRA issue for new problem.

 HiveThriftServer2 may expose Inheritable methods
 

 Key: SPARK-6049
 URL: https://issues.apache.org/jira/browse/SPARK-6049
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
  Labels: HiveThriftServer2

 Does HiveThriftServer2 may expose Inheritable methods?
 HiveThriftServer2  is very good when used as a JDBC Server, but 
 HiveThriftServer2.scala  is not Inheritable or invokable by app.
 My app use JavaSQLContext and registerTempTable.
 I want to expose these TempTables by HiveThriftServer2(JDBC Server).
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods

2015-02-27 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340096#comment-14340096
 ] 

Littlestar edited comment on SPARK-6049 at 2/27/15 2:43 PM:


very sorry, HiveThriftServer2.startWithContext can be called by java, SPARK-3675

my question is similar to SPARK-4865


was (Author: cnstar9988):
very sorry, HiveThriftServer2.startWithContext can be called by java.
I will open another JIRA issue for new problem.

 HiveThriftServer2 may expose Inheritable methods
 

 Key: SPARK-6049
 URL: https://issues.apache.org/jira/browse/SPARK-6049
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
  Labels: HiveThriftServer2

 Does HiveThriftServer2 may expose Inheritable methods?
 HiveThriftServer2  is very good when used as a JDBC Server, but 
 HiveThriftServer2.scala  is not Inheritable or invokable by app.
 My app use JavaSQLContext and registerTempTable.
 I want to expose these TempTables by HiveThriftServer2(JDBC Server).
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods

2015-02-27 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340018#comment-14340018
 ] 

Littlestar commented on SPARK-6049:
---

thanks. I call ok by static methods, but not tables finds.
Does I need to open new JIRA issue?

{noformat}
SparkConf sparkConf = new 
SparkConf().setAppName(ParquetFileQuery).set(spark.executor.memory, 4g);
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
JavaSQLContext sqlCtx = new JavaSQLContext(ctx);
JavaSchemaRDD mids = sqlCtx.parquetFile(/parquetdir);
mids.registerTempTable(demo);

HiveThriftServer2.startWithContext(new HiveContext(ctx.sc()));
{noformat}

 HiveThriftServer2 may expose Inheritable methods
 

 Key: SPARK-6049
 URL: https://issues.apache.org/jira/browse/SPARK-6049
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
  Labels: HiveThriftServer2

 Does HiveThriftServer2 may expose Inheritable methods?
 HiveThriftServer2  is very good when used as a JDBC Server, but 
 HiveThriftServer2.scala  is not Inheritable or invokable by app.
 My app use JavaSQLContext and registerTempTable.
 I want to expose these TempTables by HiveThriftServer2(JDBC Server).
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods

2015-02-27 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339980#comment-14339980
 ] 

Littlestar commented on SPARK-6049:
---

HiveThriftServer2.scala is can't call directly in other program becuase of 
privated constructor.
It's main has lot of scala code can't rewrite by java.

I request HiveThriftServer2.scala to be refactored to expose a more 
programmatic API.

My app:
  main() {
 JavaSQLContext sqlContex;
 registerTempTable(table_1);
 registerTempTable(table_2);
 I want to startup HiveThriftServer2(JDBC Server) here expose table_1 
and table_2
}

HiveThriftServer2.scala#startWithContext has DeveloperApi annotation, but 
startWithContext is not invokable by app.

 /**
   * :: DeveloperApi ::
   * Starts a new thrift server with the given context.
   */
  @DeveloperApi
  def startWithContext(sqlContext: HiveContext): Unit = {
val server = new HiveThriftServer2(sqlContext)
server.init(sqlContext.hiveconf)
server.start()
  }

thanks.



 HiveThriftServer2 may expose Inheritable methods
 

 Key: SPARK-6049
 URL: https://issues.apache.org/jira/browse/SPARK-6049
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
  Labels: HiveThriftServer2

 Does HiveThriftServer2 may expose Inheritable methods?
 HiveThriftServer2  is very good when used as a JDBC Server, but 
 HiveThriftServer2.scala  is not Inheritable or invokable by app.
 My app use JavaSQLContext and registerTempTable.
 I want to expose these TempTables by HiveThriftServer2(JDBC Server).
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods

2015-02-27 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339980#comment-14339980
 ] 

Littlestar edited comment on SPARK-6049 at 2/27/15 10:33 AM:
-

HiveThriftServer2.scala is can't call directly in other program becuase of 
privated constructor.
It's main has lot of scala code can't rewrite by java.

I request HiveThriftServer2.scala to be refactored to expose a more 
programmatic API.
{noformat}
My app:
  main() {
 JavaSQLContext sqlContex;
 registerTempTable(table_1);
 registerTempTable(table_2);
 I want to startup HiveThriftServer2(JDBC Server) here expose table_1 
and table_2
}
{noformat}
HiveThriftServer2.scala#startWithContext has DeveloperApi annotation, but 
startWithContext is not invokable by app.
{noformat}
 /**
   * :: DeveloperApi ::
   * Starts a new thrift server with the given context.
   */
  @DeveloperApi
  def startWithContext(sqlContext: HiveContext): Unit = {
val server = new HiveThriftServer2(sqlContext)
server.init(sqlContext.hiveconf)
server.start()
  }
{noformat}
thanks.




was (Author: cnstar9988):
HiveThriftServer2.scala is can't call directly in other program becuase of 
privated constructor.
It's main has lot of scala code can't rewrite by java.

I request HiveThriftServer2.scala to be refactored to expose a more 
programmatic API.

My app:
  main() {
 JavaSQLContext sqlContex;
 registerTempTable(table_1);
 registerTempTable(table_2);
 I want to startup HiveThriftServer2(JDBC Server) here expose table_1 
and table_2
}

HiveThriftServer2.scala#startWithContext has DeveloperApi annotation, but 
startWithContext is not invokable by app.

 /**
   * :: DeveloperApi ::
   * Starts a new thrift server with the given context.
   */
  @DeveloperApi
  def startWithContext(sqlContext: HiveContext): Unit = {
val server = new HiveThriftServer2(sqlContext)
server.init(sqlContext.hiveconf)
server.start()
  }

thanks.



 HiveThriftServer2 may expose Inheritable methods
 

 Key: SPARK-6049
 URL: https://issues.apache.org/jira/browse/SPARK-6049
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
  Labels: HiveThriftServer2

 Does HiveThriftServer2 may expose Inheritable methods?
 HiveThriftServer2  is very good when used as a JDBC Server, but 
 HiveThriftServer2.scala  is not Inheritable or invokable by app.
 My app use JavaSQLContext and registerTempTable.
 I want to expose these TempTables by HiveThriftServer2(JDBC Server).
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods

2015-02-26 Thread Littlestar (JIRA)
Littlestar created SPARK-6049:
-

 Summary: HiveThriftServer2 may expose Inheritable methods
 Key: SPARK-6049
 URL: https://issues.apache.org/jira/browse/SPARK-6049
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor


Does HiveThriftServer2 may expose Inheritable methods?
HiveThriftServer2  is very good when used as a JDBC Server, but 
HiveThriftServer2.scala  is not Inheritable or invokable by app.

My app use JavaSQLContext and registerTempTable.
I want to expose these TempTables by HiveThriftServer2(JDBC Server).

Thanks.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found

2015-02-16 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323529#comment-14323529
 ] 

Littlestar commented on SPARK-5829:
---

The following code  as same as saveAsNewAPIHadoopFiles, run ok for me.
Mark here.

{noformat}
import org.apache.spark.streaming.Time;
import org.apache.spark.streaming.api.java.JavaPairDStream;
...
final String outDir = /testspark/out;
is.foreachRDD(new Function2JavaPairRDDInteger, Integer, Time, 
Void() {
@Override
public Void call(JavaPairRDDInteger, Integer t1, Time t2) throws 
Exception {
if (t1.partitions().size()  0) {
t1.saveAsNewAPIHadoopFile(outDir + t2.milliseconds() + 
tmp, Integer.class, Integer.class, TextOutputFormat.class);
}
return null;
}
});

{noformat}

 JavaStreamingContext.fileStream run task loop repeated  empty when no more 
 new files found
 --

 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar
Priority: Minor

 spark master (1.3.0) with SPARK-5826 patch.
 JavaStreamingContext.fileStream run task repeated empty when no more new files
 reproduce:
   1. mkdir /testspark/watchdir on HDFS.
   2. run app.
   3. put some text files into /testspark/watchdir.
 every 30 seconds, spark log indicates that a new sub task runs.
 and /testspark/resultdir/ has new directory with empty files every 30 seconds.
 when no new files add, but it runs new task with empy rdd.
 {noformat}
 package my.test.hadoop.spark;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.api.java.function.Function2;
 import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.streaming.Durations;
 import org.apache.spark.streaming.api.java.JavaPairDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import scala.Tuple2;
 public class TestStream {
   @SuppressWarnings({ serial, resource })
   public static void main(String[] args) throws Exception {
   
   SparkConf conf = new SparkConf().setAppName(TestStream);
   JavaStreamingContext jssc = new JavaStreamingContext(conf, 
 Durations.seconds(30));
   jssc.checkpoint(/testspark/checkpointdir);
   Configuration jobConf = new Configuration();
   jobConf.set(my.test.fields,fields);
 JavaPairDStreamInteger, Integer is = 
 jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
 TextInputFormat.class, new FunctionPath, Boolean() {
 @Override
 public Boolean call(Path v1) throws Exception {
 return true;
 }
 }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
 Text, Integer, Integer() {
 @Override
 public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
 arg0) throws Exception {
 return new Tuple2Integer, Integer(1, 1);
 }
 });
   JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
 Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer arg0, Integer arg1) throws 
 Exception {
   return arg0 + arg1;
   }
   });
   rs.checkpoint(Durations.seconds(60));
   rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
 suffix, Integer.class, Integer.class, TextOutputFormat.class);
   jssc.start();
   jssc.awaitTermination();
   }
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found

2015-02-16 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323619#comment-14323619
 ] 

Littlestar commented on SPARK-5829:
---

But it depends on org.apache.spark.streaming.Time, not easy to call.
Seems like this is a necessary feature of the current design and can be 
partially worked around by filtering in user space.
SPARK-3292

 JavaStreamingContext.fileStream run task loop repeated  empty when no more 
 new files found
 --

 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar
Priority: Minor

 spark master (1.3.0) with SPARK-5826 patch.
 JavaStreamingContext.fileStream run task repeated empty when no more new files
 reproduce:
   1. mkdir /testspark/watchdir on HDFS.
   2. run app.
   3. put some text files into /testspark/watchdir.
 every 30 seconds, spark log indicates that a new sub task runs.
 and /testspark/resultdir/ has new directory with empty files every 30 seconds.
 when no new files add, but it runs new task with empy rdd.
 {noformat}
 package my.test.hadoop.spark;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.api.java.function.Function2;
 import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.streaming.Durations;
 import org.apache.spark.streaming.api.java.JavaPairDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import scala.Tuple2;
 public class TestStream {
   @SuppressWarnings({ serial, resource })
   public static void main(String[] args) throws Exception {
   
   SparkConf conf = new SparkConf().setAppName(TestStream);
   JavaStreamingContext jssc = new JavaStreamingContext(conf, 
 Durations.seconds(30));
   jssc.checkpoint(/testspark/checkpointdir);
   Configuration jobConf = new Configuration();
   jobConf.set(my.test.fields,fields);
 JavaPairDStreamInteger, Integer is = 
 jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
 TextInputFormat.class, new FunctionPath, Boolean() {
 @Override
 public Boolean call(Path v1) throws Exception {
 return true;
 }
 }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
 Text, Integer, Integer() {
 @Override
 public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
 arg0) throws Exception {
 return new Tuple2Integer, Integer(1, 1);
 }
 });
   JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
 Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer arg0, Integer arg1) throws 
 Exception {
   return arg0 + arg1;
   }
   });
   rs.checkpoint(Durations.seconds(60));
   rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
 suffix, Integer.class, Integer.class, TextOutputFormat.class);
   jssc.start();
   jssc.awaitTermination();
   }
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3638) Commons HTTP client dependency conflict in extras/kinesis-asl module

2015-02-16 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322559#comment-14322559
 ] 

Littlestar commented on SPARK-3638:
---

Oh, It was introduced in kinesis-asl profile only.
I think httpclient 4.1.2 is too old,  standard distribution may conflict with 
other httpclient required user app.
now I build spark with kinesis-asl profile, it's ok with httpclient 4.2.6, 
thanks.

mvn dependency:tree



 Commons HTTP client dependency conflict in extras/kinesis-asl module
 

 Key: SPARK-3638
 URL: https://issues.apache.org/jira/browse/SPARK-3638
 Project: Spark
  Issue Type: Bug
  Components: Examples, Streaming
Affects Versions: 1.1.0
Reporter: Aniket Bhatnagar
  Labels: dependencies
 Fix For: 1.1.1, 1.2.0


 Followed instructions as mentioned @ 
 https://github.com/apache/spark/blob/master/docs/streaming-kinesis-integration.md
  and when running the example, I get the following error:
 {code}
 Caused by: java.lang.NoSuchMethodError: 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
 at 
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
 at 
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
 at 
 com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
 at 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
 at 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
 at 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
 at 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
 at 
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)
 {code}
 I believe this is due to the dependency conflict as described @ 
 http://mail-archives.apache.org/mod_mbox/spark-dev/201409.mbox/%3ccajob8btdxks-7-spjj5jmnw0xsnrjwdpcqqtjht1hun6j4z...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old

2015-02-16 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5834:
--
Description: 
 I see 
spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
 It indicates that officical package only use httpclient 4.1.2.

some spark module requires httpclient 4.2 and above.
https://github.com/apache/spark/pull/2489/files ( 
commons.httpclient.version4.2/commons.httpclient.version)
https://github.com/apache/spark/pull/2535/files 
(commons.httpclient.version4.2.6/commons.httpclient.version)

I think httpclient 4.1.2 is too old, standard distribution may conflict with 
other httpclient required user app.


  was:
 I see 
spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
 It indicates that officical package only use httpclient 4.1.2.

some spark module required httpclient 4.2 and above.
https://github.com/apache/spark/pull/2489/files ( 
commons.httpclient.version4.2/commons.httpclient.version)
https://github.com/apache/spark/pull/2535/files 
(commons.httpclient.version4.2.6/commons.httpclient.version)

I think httpclient 4.1.2 is too old, standard distribution may conflict with 
other httpclient required user app.



 spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
 --

 Key: SPARK-5834
 URL: https://issues.apache.org/jira/browse/SPARK-5834
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor

  I see 
 spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
  It indicates that officical package only use httpclient 4.1.2.
 some spark module requires httpclient 4.2 and above.
 https://github.com/apache/spark/pull/2489/files ( 
 commons.httpclient.version4.2/commons.httpclient.version)
 https://github.com/apache/spark/pull/2535/files 
 (commons.httpclient.version4.2.6/commons.httpclient.version)
 I think httpclient 4.1.2 is too old, standard distribution may conflict with 
 other httpclient required user app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old

2015-02-16 Thread Littlestar (JIRA)
Littlestar created SPARK-5834:
-

 Summary: spark 1.2.1 officical package bundled with httpclient 
4.1.2 is too old
 Key: SPARK-5834
 URL: https://issues.apache.org/jira/browse/SPARK-5834
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 1.2.1
Reporter: Littlestar


assembly-1.1.1-hadoop2.4.0.jar the class HttpPatch is not there which was 
introduced in 4.2
 I see 
spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
 It indicates that officical package only use httpclient 4.1.2.

some spark module required httpclient 4.2 and above.
https://github.com/apache/spark/pull/2489/files ( 
commons.httpclient.version4.2/commons.httpclient.version)
https://github.com/apache/spark/pull/2535/files 
(commons.httpclient.version4.2.6/commons.httpclient.version)

I think httpclient 4.1.2 is too old, standard distribution may conflict with 
other httpclient required user app.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old

2015-02-16 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5834:
--
Description: 
 I see 
spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
 It indicates that officical package only use httpclient 4.1.2.

some spark module required httpclient 4.2 and above.
https://github.com/apache/spark/pull/2489/files ( 
commons.httpclient.version4.2/commons.httpclient.version)
https://github.com/apache/spark/pull/2535/files 
(commons.httpclient.version4.2.6/commons.httpclient.version)

I think httpclient 4.1.2 is too old, standard distribution may conflict with 
other httpclient required user app.


  was:
assembly-1.1.1-hadoop2.4.0.jar the class HttpPatch is not there which was 
introduced in 4.2
 I see 
spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
 It indicates that officical package only use httpclient 4.1.2.

some spark module required httpclient 4.2 and above.
https://github.com/apache/spark/pull/2489/files ( 
commons.httpclient.version4.2/commons.httpclient.version)
https://github.com/apache/spark/pull/2535/files 
(commons.httpclient.version4.2.6/commons.httpclient.version)

I think httpclient 4.1.2 is too old, standard distribution may conflict with 
other httpclient required user app.


   Priority: Minor  (was: Major)

 spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
 --

 Key: SPARK-5834
 URL: https://issues.apache.org/jira/browse/SPARK-5834
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor

  I see 
 spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
  It indicates that officical package only use httpclient 4.1.2.
 some spark module required httpclient 4.2 and above.
 https://github.com/apache/spark/pull/2489/files ( 
 commons.httpclient.version4.2/commons.httpclient.version)
 https://github.com/apache/spark/pull/2535/files 
 (commons.httpclient.version4.2.6/commons.httpclient.version)
 I think httpclient 4.1.2 is too old, standard distribution may conflict with 
 other httpclient required user app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321862#comment-14321862
 ] 

Littlestar commented on SPARK-5826:
---

I put some txt files into /testspark/watchdir, It throws  NullPointerException

15/02/15 16:18:20 WARN dstream.FileInputDStream: Error finding new files
java.lang.NullPointerException
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$fn$3$1.apply(JavaStreamingContext.scala:329)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$fn$3$1.apply(JavaStreamingContext.scala:329)
at 
org.apache.spark.streaming.dstream.FileInputDStream.org$apache$spark$streaming$dstream$FileInputDStream$$isNewFile(FileInputDStream.scala:215)
at 
org.apache.spark.streaming.dstream.FileInputDStream$$anon$3.accept(FileInputDStream.scala:172)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1489)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1523)
at 
org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:174)
at 
org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:132)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:285)
at 
org.apache.spark.streaming.dstream.MappedDStream.compute(MappedDStream.scala:35)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:285)
at 
org.apache.spark.streaming.dstream.ShuffledDStream.compute(ShuffledDStream.scala:41)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:300)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
at scala.Option.orElse(Option.scala:257)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:285)
at 
org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at 
org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:232)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:230)
at scala.util.Try$.apply(Try.scala:161)
at 
org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:230)
at 
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:167)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
at 

[jira] [Created] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)
Littlestar created SPARK-5826:
-

 Summary: JavaStreamingContext.fileStream cause Configuration 
NotSerializableException
 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor


org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
directory, ClassLongWritable kClass, ClassText vClass, 
ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
newFilesOnly, Configuration conf)

I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.

but it throw strange exception.

java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
at 
org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at 
org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
at 
org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
at 
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 

[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5826:
--
Attachment: TestStream.java

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
 Attachments: TestStream.java


 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at 

[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5826:
--
Attachment: (was: TestStream.java)

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
 Attachments: TestStream.java


 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at 

[jira] [Commented] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321848#comment-14321848
 ] 

Littlestar commented on SPARK-5795:
---

works for me, thanks.

 api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
 -

 Key: SPARK-5795
 URL: https://issues.apache.org/jira/browse/SPARK-5795
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Critical
 Attachments: TestStreamCompile.java


 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 the following code can't compile on java.
 JavaPairDStreamInteger, Integer rs =
 rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, 
 TextOutputFormat.class, jobConf);
 but similar code in JavaPairRDD works ok.
 JavaPairRDDString, String counts =...
 counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, 
 TextOutputFormat.class, jobConf);
 
 mybe the 
   def saveAsNewAPIHadoopFiles(
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }
 =
 def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[F],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321849#comment-14321849
 ] 

Littlestar edited comment on SPARK-5826 at 2/15/15 7:51 AM:


testcode upload.

It throw Exception every 2 seconds.

15/02/15 15:50:35 ERROR actor.OneForOneStrategy: 
org.apache.hadoop.conf.Configuration
java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
at 
org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
at 
org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at 
org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
at 
org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
at 
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


was (Author: cnstar9988):
testcode upload.
!TestStream.java!

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: 

[jira] [Comment Edited] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321848#comment-14321848
 ] 

Littlestar edited comment on SPARK-5795 at 2/15/15 8:05 AM:


I merge pull/4608 and rebuild, it works for me, thanks.


was (Author: cnstar9988):
works for me, thanks.

 api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
 -

 Key: SPARK-5795
 URL: https://issues.apache.org/jira/browse/SPARK-5795
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Critical
 Attachments: TestStreamCompile.java


 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 the following code can't compile on java.
 JavaPairDStreamInteger, Integer rs =
 rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, 
 TextOutputFormat.class, jobConf);
 but similar code in JavaPairRDD works ok.
 JavaPairRDDString, String counts =...
 counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, 
 TextOutputFormat.class, jobConf);
 
 mybe the 
   def saveAsNewAPIHadoopFiles(
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }
 =
 def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[F],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5826:
--
Attachment: TestStream.java

testcode upload.
!TestStream.java!

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
 Attachments: TestStream.java


 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at 

[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5826:
--
Component/s: Streaming

 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor

 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 

[jira] [Commented] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322317#comment-14322317
 ] 

Littlestar commented on SPARK-5826:
---

I merge pull/4612 and rebuild, it works for me, no exception, thanks.



 JavaStreamingContext.fileStream cause Configuration NotSerializableException
 

 Key: SPARK-5826
 URL: https://issues.apache.org/jira/browse/SPARK-5826
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Critical
 Attachments: TestStream.java


 org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String 
 directory, ClassLongWritable kClass, ClassText vClass, 
 ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean 
 newFilesOnly, Configuration conf)
 I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration.
 but it throw strange exception.
 java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440)
   at 
 org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177)
   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075)
   at 
 org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at 
 

[jira] [Created] (SPARK-5829) JavaStreamingContext.fileStream run task repeated empty when no more new files

2015-02-15 Thread Littlestar (JIRA)
Littlestar created SPARK-5829:
-

 Summary: JavaStreamingContext.fileStream run task repeated empty 
when no more new files
 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar


spark master (1.3.0) with SPARK-5826 patch.

JavaStreamingContext.fileStream run task repeated empty when no more new files

reproduce:
  1. mkdir /testspark/watchdir on HDFS.
  2. run app.
  3. put some text files into /testspark/watchdir.
every 30 seconds, spark log indicates that a new sub task runs.
and /testspark/resultdir/ has new directory with empty files every 30 seconds.

when no new files add, but it runs new task with empy rdd.

{noformat}
package my.test.hadoop.spark;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;

import scala.Tuple2;

public class TestStream {
@SuppressWarnings({ serial, resource })
public static void main(String[] args) throws Exception {

SparkConf conf = new SparkConf().setAppName(TestStream);
JavaStreamingContext jssc = new JavaStreamingContext(conf, 
Durations.seconds(30));
jssc.checkpoint(/testspark/checkpointdir);
Configuration jobConf = new Configuration();
jobConf.set(my.test.fields,fields);
JavaPairDStreamInteger, Integer is = 
jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
TextInputFormat.class, new FunctionPath, Boolean() {
@Override
public Boolean call(Path v1) throws Exception {
return true;
}
}, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
Text, Integer, Integer() {
@Override
public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
arg0) throws Exception {
return new Tuple2Integer, Integer(1, 1);
}
});

JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
Function2Integer, Integer, Integer() {
@Override
public Integer call(Integer arg0, Integer arg1) throws 
Exception {
return arg0 + arg1;
}
});

rs.checkpoint(Durations.seconds(60));
rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
suffix, Integer.class, Integer.class, TextOutputFormat.class);
jssc.start();
jssc.awaitTermination();
}
}

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5829) JavaStreamingContext.fileStream run task repeated empty when no more new files

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322321#comment-14322321
 ] 

Littlestar commented on SPARK-5829:
---

when I add new files into  /testspark/watchdir, it runs new task with good 
output.
when no new files add, it runs new task with empy rdd every 30 seconds.(I think 
there is some bugs, when no new files found)

 JavaStreamingContext.fileStream run task repeated empty when no more new files
 --

 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar

 spark master (1.3.0) with SPARK-5826 patch.
 JavaStreamingContext.fileStream run task repeated empty when no more new files
 reproduce:
   1. mkdir /testspark/watchdir on HDFS.
   2. run app.
   3. put some text files into /testspark/watchdir.
 every 30 seconds, spark log indicates that a new sub task runs.
 and /testspark/resultdir/ has new directory with empty files every 30 seconds.
 when no new files add, but it runs new task with empy rdd.
 {noformat}
 package my.test.hadoop.spark;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.api.java.function.Function2;
 import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.streaming.Durations;
 import org.apache.spark.streaming.api.java.JavaPairDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import scala.Tuple2;
 public class TestStream {
   @SuppressWarnings({ serial, resource })
   public static void main(String[] args) throws Exception {
   
   SparkConf conf = new SparkConf().setAppName(TestStream);
   JavaStreamingContext jssc = new JavaStreamingContext(conf, 
 Durations.seconds(30));
   jssc.checkpoint(/testspark/checkpointdir);
   Configuration jobConf = new Configuration();
   jobConf.set(my.test.fields,fields);
 JavaPairDStreamInteger, Integer is = 
 jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
 TextInputFormat.class, new FunctionPath, Boolean() {
 @Override
 public Boolean call(Path v1) throws Exception {
 return true;
 }
 }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
 Text, Integer, Integer() {
 @Override
 public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
 arg0) throws Exception {
 return new Tuple2Integer, Integer(1, 1);
 }
 });
   JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
 Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer arg0, Integer arg1) throws 
 Exception {
   return arg0 + arg1;
   }
   });
   rs.checkpoint(Durations.seconds(60));
   rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
 suffix, Integer.class, Integer.class, TextOutputFormat.class);
   jssc.start();
   jssc.awaitTermination();
   }
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5829:
--
Summary: JavaStreamingContext.fileStream run task loop repeated  empty when 
no more new files found  (was: JavaStreamingContext.fileStream run task 
repeated empty when no more new files)

 JavaStreamingContext.fileStream run task loop repeated  empty when no more 
 new files found
 --

 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar

 spark master (1.3.0) with SPARK-5826 patch.
 JavaStreamingContext.fileStream run task repeated empty when no more new files
 reproduce:
   1. mkdir /testspark/watchdir on HDFS.
   2. run app.
   3. put some text files into /testspark/watchdir.
 every 30 seconds, spark log indicates that a new sub task runs.
 and /testspark/resultdir/ has new directory with empty files every 30 seconds.
 when no new files add, but it runs new task with empy rdd.
 {noformat}
 package my.test.hadoop.spark;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.api.java.function.Function2;
 import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.streaming.Durations;
 import org.apache.spark.streaming.api.java.JavaPairDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import scala.Tuple2;
 public class TestStream {
   @SuppressWarnings({ serial, resource })
   public static void main(String[] args) throws Exception {
   
   SparkConf conf = new SparkConf().setAppName(TestStream);
   JavaStreamingContext jssc = new JavaStreamingContext(conf, 
 Durations.seconds(30));
   jssc.checkpoint(/testspark/checkpointdir);
   Configuration jobConf = new Configuration();
   jobConf.set(my.test.fields,fields);
 JavaPairDStreamInteger, Integer is = 
 jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
 TextInputFormat.class, new FunctionPath, Boolean() {
 @Override
 public Boolean call(Path v1) throws Exception {
 return true;
 }
 }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
 Text, Integer, Integer() {
 @Override
 public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
 arg0) throws Exception {
 return new Tuple2Integer, Integer(1, 1);
 }
 });
   JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
 Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer arg0, Integer arg1) throws 
 Exception {
   return arg0 + arg1;
   }
   });
   rs.checkpoint(Durations.seconds(60));
   rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
 suffix, Integer.class, Integer.class, TextOutputFormat.class);
   jssc.start();
   jssc.awaitTermination();
   }
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found

2015-02-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5829:
--
Priority: Minor  (was: Major)

 JavaStreamingContext.fileStream run task loop repeated  empty when no more 
 new files found
 --

 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar
Priority: Minor

 spark master (1.3.0) with SPARK-5826 patch.
 JavaStreamingContext.fileStream run task repeated empty when no more new files
 reproduce:
   1. mkdir /testspark/watchdir on HDFS.
   2. run app.
   3. put some text files into /testspark/watchdir.
 every 30 seconds, spark log indicates that a new sub task runs.
 and /testspark/resultdir/ has new directory with empty files every 30 seconds.
 when no new files add, but it runs new task with empy rdd.
 {noformat}
 package my.test.hadoop.spark;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.api.java.function.Function2;
 import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.streaming.Durations;
 import org.apache.spark.streaming.api.java.JavaPairDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import scala.Tuple2;
 public class TestStream {
   @SuppressWarnings({ serial, resource })
   public static void main(String[] args) throws Exception {
   
   SparkConf conf = new SparkConf().setAppName(TestStream);
   JavaStreamingContext jssc = new JavaStreamingContext(conf, 
 Durations.seconds(30));
   jssc.checkpoint(/testspark/checkpointdir);
   Configuration jobConf = new Configuration();
   jobConf.set(my.test.fields,fields);
 JavaPairDStreamInteger, Integer is = 
 jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
 TextInputFormat.class, new FunctionPath, Boolean() {
 @Override
 public Boolean call(Path v1) throws Exception {
 return true;
 }
 }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
 Text, Integer, Integer() {
 @Override
 public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
 arg0) throws Exception {
 return new Tuple2Integer, Integer(1, 1);
 }
 });
   JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
 Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer arg0, Integer arg1) throws 
 Exception {
   return arg0 + arg1;
   }
   });
   rs.checkpoint(Durations.seconds(60));
   rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
 suffix, Integer.class, Integer.class, TextOutputFormat.class);
   jssc.start();
   jssc.awaitTermination();
   }
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found

2015-02-15 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322408#comment-14322408
 ] 

Littlestar edited comment on SPARK-5829 at 2/16/15 6:05 AM:


FileInputDStream.scala
{noformat}
  override def compute(validTime: Time): Option[RDD[(K, V)]] = {
// Find new files
val newFiles = findNewFiles(validTime.milliseconds)
logInfo(New files at time  + validTime + :\n + newFiles.mkString(\n))
batchTimeToSelectedFiles += ((validTime, newFiles))
recentlySelectedFiles ++= newFiles
+may there is check {newFiles.size  0}   can avoid this 
problem??+
Some(filesToRDD(newFiles))
  }
{noformat}
Thanks.


was (Author: cnstar9988):
FileInputDStream.scala
  override def compute(validTime: Time): Option[RDD[(K, V)]] = {
// Find new files
val newFiles = findNewFiles(validTime.milliseconds)
logInfo(New files at time  + validTime + :\n + newFiles.mkString(\n))
batchTimeToSelectedFiles += ((validTime, newFiles))
recentlySelectedFiles ++= newFiles
+may there is check {newFiles.size  0}   can avoid this 
problem??+
Some(filesToRDD(newFiles))
  }

Thanks.

 JavaStreamingContext.fileStream run task loop repeated  empty when no more 
 new files found
 --

 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar
Priority: Minor

 spark master (1.3.0) with SPARK-5826 patch.
 JavaStreamingContext.fileStream run task repeated empty when no more new files
 reproduce:
   1. mkdir /testspark/watchdir on HDFS.
   2. run app.
   3. put some text files into /testspark/watchdir.
 every 30 seconds, spark log indicates that a new sub task runs.
 and /testspark/resultdir/ has new directory with empty files every 30 seconds.
 when no new files add, but it runs new task with empy rdd.
 {noformat}
 package my.test.hadoop.spark;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.api.java.function.Function2;
 import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.streaming.Durations;
 import org.apache.spark.streaming.api.java.JavaPairDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import scala.Tuple2;
 public class TestStream {
   @SuppressWarnings({ serial, resource })
   public static void main(String[] args) throws Exception {
   
   SparkConf conf = new SparkConf().setAppName(TestStream);
   JavaStreamingContext jssc = new JavaStreamingContext(conf, 
 Durations.seconds(30));
   jssc.checkpoint(/testspark/checkpointdir);
   Configuration jobConf = new Configuration();
   jobConf.set(my.test.fields,fields);
 JavaPairDStreamInteger, Integer is = 
 jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
 TextInputFormat.class, new FunctionPath, Boolean() {
 @Override
 public Boolean call(Path v1) throws Exception {
 return true;
 }
 }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
 Text, Integer, Integer() {
 @Override
 public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
 arg0) throws Exception {
 return new Tuple2Integer, Integer(1, 1);
 }
 });
   JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
 Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer arg0, Integer arg1) throws 
 Exception {
   return arg0 + arg1;
   }
   });
   rs.checkpoint(Durations.seconds(60));
   rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
 suffix, Integer.class, Integer.class, TextOutputFormat.class);
   jssc.start();
   jssc.awaitTermination();
   }
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-13 Thread Littlestar (JIRA)
Littlestar created SPARK-5795:
-

 Summary: api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not 
friendly to java
 Key: SPARK-5795
 URL: https://issues.apache.org/jira/browse/SPARK-5795
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar


import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

the following code can't compile on java.
JavaPairDStreamInteger, Integer rs =
rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, 
TextOutputFormat.class, jobConf);

but similar code in JavaPairRDD works ok.
JavaPairRDDString, String counts =...
counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, 
TextOutputFormat.class, jobConf);

mybe the 
  def saveAsNewAPIHadoopFiles(
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[_ : NewOutputFormat[_, _]],
  conf: Configuration = new Configuration) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass, conf)
  }
=
def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F],
  conf: Configuration = new Configuration) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass, conf)
  }








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-13 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319769#comment-14319769
 ] 

Littlestar commented on SPARK-5795:
---

org.apache.spark.api.java.JavaPairRDDK, V
{noformat}
/** Output the RDD to any Hadoop-supported file system. */
  def saveAsHadoopFile[F : OutputFormat[_, _]](
  path: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F],
  conf: JobConf) {
rdd.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass, conf)
  }

  /** Output the RDD to any Hadoop-supported file system. */
  def saveAsHadoopFile[F : OutputFormat[_, _]](
  path: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F]) {
rdd.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass)
  }

  /** Output the RDD to any Hadoop-supported file system, compressing with the 
supplied codec. */
  def saveAsHadoopFile[F : OutputFormat[_, _]](
  path: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F],
  codec: Class[_ : CompressionCodec]) {
rdd.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass, codec)
  }

  /** Output the RDD to any Hadoop-supported file system. */
  def saveAsNewAPIHadoopFile[F : NewOutputFormat[_, _]](
  path: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F],
  conf: Configuration) {
rdd.saveAsNewAPIHadoopFile(path, keyClass, valueClass, outputFormatClass, 
conf)
  }

  /**
   * Output the RDD to any Hadoop-supported storage system, using
   * a Configuration object for that storage system.
   */
  def saveAsNewAPIHadoopDataset(conf: Configuration) {
rdd.saveAsNewAPIHadoopDataset(conf)
  }

  /** Output the RDD to any Hadoop-supported file system. */
  def saveAsNewAPIHadoopFile[F : NewOutputFormat[_, _]](
  path: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F]) {
rdd.saveAsNewAPIHadoopFile(path, keyClass, valueClass, outputFormatClass)
  }
{noformat}

org.apache.spark.streaming.api.java.JavaPairDStreamK, V

{noformat}
/**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsHadoopFiles[F : OutputFormat[K, V]](prefix: String, suffix: 
String) {
dstream.saveAsHadoopFiles(prefix, suffix)
  }

  /**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsHadoopFiles(
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[_ : OutputFormat[_, _]]) {
dstream.saveAsHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass)
  }

  /**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsHadoopFiles(
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[_ : OutputFormat[_, _]],
  conf: JobConf) {
dstream.saveAsHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass, conf)
  }

  /**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsNewAPIHadoopFiles[F : NewOutputFormat[K, V]](prefix: String, 
suffix: String) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix)
  }

  /**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsNewAPIHadoopFiles(
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[_ : NewOutputFormat[_, _]]) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass)
  }

  /**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsNewAPIHadoopFiles(
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[_ : NewOutputFormat[_, _]],
  conf: Configuration = new Configuration) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass, conf)
  }
{noformat}

 api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
 

[jira] [Updated] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-13 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5795:
--
Attachment: TestStreamCompile.java

my testcase on java 1.7 and spark 1.3 trunk.
Thanks.

 api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
 -

 Key: SPARK-5795
 URL: https://issues.apache.org/jira/browse/SPARK-5795
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
 Attachments: TestStreamCompile.java


 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 the following code can't compile on java.
 JavaPairDStreamInteger, Integer rs =
 rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, 
 TextOutputFormat.class, jobConf);
 but similar code in JavaPairRDD works ok.
 JavaPairRDDString, String counts =...
 counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, 
 TextOutputFormat.class, jobConf);
 
 mybe the 
   def saveAsNewAPIHadoopFiles(
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }
 =
 def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[F],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-13 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320109#comment-14320109
 ] 

Littlestar commented on SPARK-5795:
---

Does it same problem as SPARK-5297, thanks.


 api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
 -

 Key: SPARK-5795
 URL: https://issues.apache.org/jira/browse/SPARK-5795
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor
 Attachments: TestStreamCompile.java


 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 the following code can't compile on java.
 JavaPairDStreamInteger, Integer rs =
 rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, 
 TextOutputFormat.class, jobConf);
 but similar code in JavaPairRDD works ok.
 JavaPairRDDString, String counts =...
 counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, 
 TextOutputFormat.class, jobConf);
 
 mybe the 
   def saveAsNewAPIHadoopFiles(
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }
 =
 def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[F],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-13 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320083#comment-14320083
 ] 

Littlestar commented on SPARK-5795:
---

error info...
The method saveAsNewAPIHadoopFiles(String, String, Class?, Class?, Class? 
extends OutputFormat?,?) in the type JavaPairDStreamInteger,Integer is not 
applicable for the arguments (String, String, ClassInteger, ClassInteger, 
ClassTextOutputFormat)




 api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
 -

 Key: SPARK-5795
 URL: https://issues.apache.org/jira/browse/SPARK-5795
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor

 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 the following code can't compile on java.
 JavaPairDStreamInteger, Integer rs =
 rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, 
 TextOutputFormat.class, jobConf);
 but similar code in JavaPairRDD works ok.
 JavaPairRDDString, String counts =...
 counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, 
 TextOutputFormat.class, jobConf);
 
 mybe the 
   def saveAsNewAPIHadoopFiles(
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }
 =
 def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[F],
   conf: Configuration = new Configuration) {
 dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
 outputFormatClass, conf)
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org