[jira] [Comment Edited] (SPARK-6678) select count(DISTINCT C_UID) from parquetdir may be can optimize
[ https://issues.apache.org/jira/browse/SPARK-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567965#comment-15567965 ] Littlestar edited comment on SPARK-6678 at 10/12/16 7:57 AM: - I compare it to my other spark code, it do same thing: count distinct C_UID . C_UID===>split 1000 partion==>each partion computed distinct C_UID==>sum total distinct count. I think “org.apache.spark.rdd.RDD.collect(RDD.scala:813)” is very slow, it's not necessary. I just need distict count, not each C_UID value. was (Author: cnstar9988): I compre it to my other spark code, it do same thing: count distinct C_UID . I think “org.apache.spark.rdd.RDD.collect(RDD.scala:813)” is very slow, it's not necessary. I just need distict count, not each C_UID value. > select count(DISTINCT C_UID) from parquetdir may be can optimize > > > Key: SPARK-6678 > URL: https://issues.apache.org/jira/browse/SPARK-6678 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Littlestar >Priority: Minor > > 2.2T parquet files(5000 files total, 100 billion records, 2 billion unique > C_UID). > I run the following sql, may be RDD.collect is very slow > select count(DISTINCT C_UID) from parquetdir > select count(DISTINCT C_UID) from parquetdir > collect at SparkPlan.scala:83 +details > org.apache.spark.rdd.RDD.collect(RDD.scala:813) > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:83) > org.apache.spark.sql.DataFrame.collect(DataFrame.scala:815) > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178) > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231) > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:606) > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79) > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37) > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:415) > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493) > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60) > com.sun.proxy.$Proxy23.executeStatementAsync(Unknown Source) > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6678) select count(DISTINCT C_UID) from parquetdir may be can optimize
[ https://issues.apache.org/jira/browse/SPARK-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567965#comment-15567965 ] Littlestar commented on SPARK-6678: --- I compre it to my other spark code, it do same thing: count distinct C_UID . I think “org.apache.spark.rdd.RDD.collect(RDD.scala:813)” is very slow, it's not necessary. I just need distict count, not each C_UID value. > select count(DISTINCT C_UID) from parquetdir may be can optimize > > > Key: SPARK-6678 > URL: https://issues.apache.org/jira/browse/SPARK-6678 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Littlestar >Priority: Minor > > 2.2T parquet files(5000 files total, 100 billion records, 2 billion unique > C_UID). > I run the following sql, may be RDD.collect is very slow > select count(DISTINCT C_UID) from parquetdir > select count(DISTINCT C_UID) from parquetdir > collect at SparkPlan.scala:83 +details > org.apache.spark.rdd.RDD.collect(RDD.scala:813) > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:83) > org.apache.spark.sql.DataFrame.collect(DataFrame.scala:815) > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178) > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231) > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:606) > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79) > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37) > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:415) > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493) > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60) > com.sun.proxy.$Proxy23.executeStatementAsync(Unknown Source) > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704534#comment-14704534 ] Littlestar commented on SPARK-2883: --- spark 1.4.1: The orc file writer relies on HiveContext and Hive metastore Spark Support for ORCFile format Key: SPARK-2883 URL: https://issues.apache.org/jira/browse/SPARK-2883 Project: Spark Issue Type: New Feature Components: Input/Output, SQL Reporter: Zhan Zhang Assignee: Zhan Zhang Priority: Critical Fix For: 1.4.0 Attachments: 2014-09-12 07.05.24 pm Spark UI.png, 2014-09-12 07.07.19 pm jobtracker.png, orc.diff Verify the support of OrcInputFormat in spark, fix issues if exists and add documentation of its usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3644) REST API for Spark application info (jobs / stages / tasks / storage info)
[ https://issues.apache.org/jira/browse/SPARK-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620328#comment-14620328 ] Littlestar commented on SPARK-3644: --- Does there has any json api reference document for spark 1.4.0, thanks. I found nothing in spark 1.4 doc. REST API for Spark application info (jobs / stages / tasks / storage info) -- Key: SPARK-3644 URL: https://issues.apache.org/jira/browse/SPARK-3644 Project: Spark Issue Type: New Feature Components: Spark Core, Web UI Reporter: Josh Rosen Assignee: Imran Rashid Fix For: 1.4.0 This JIRA is a forum to draft a design proposal for a REST interface for accessing information about Spark applications, such as job / stage / task / storage status. There have been a number of proposals to serve JSON representations of the information displayed in Spark's web UI. Given that we might redesign the pages of the web UI (and possibly re-implement the UI as a client of a REST API), the API endpoints and their responses should be independent of what we choose to display on particular web UI pages / layouts. Let's start a discussion of what a good REST API would look like from first-principles. We can discuss what urls / endpoints expose access to data, how our JSON responses will be formatted, how fields will be named, how the API will be documented and tested, etc. Some links for inspiration: https://developer.github.com/v3/ http://developer.netflix.com/docs/REST_API_Reference https://helloreverb.com/developers/swagger -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5281) Registering table on RDD is giving MissingRequirementError
[ https://issues.apache.org/jira/browse/SPARK-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604632#comment-14604632 ] Littlestar commented on SPARK-5281: --- Does this patch merge to spark 1.3 branch, thanks. Registering table on RDD is giving MissingRequirementError -- Key: SPARK-5281 URL: https://issues.apache.org/jira/browse/SPARK-5281 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0, 1.3.1 Reporter: sarsol Assignee: Iulian Dragos Priority: Critical Fix For: 1.4.0 Application crashes on this line {{rdd.registerTempTable(temp)}} in 1.2 version when using sbt or Eclipse SCALA IDE Stacktrace: {code} Exception in thread main scala.reflect.internal.MissingRequirementError: class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with primordial classloader with boot classpath [C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-library.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-reflect.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-actor.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-swing.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-compiler.jar;C:\Program Files\Java\jre7\lib\resources.jar;C:\Program Files\Java\jre7\lib\rt.jar;C:\Program Files\Java\jre7\lib\sunrsasign.jar;C:\Program Files\Java\jre7\lib\jsse.jar;C:\Program Files\Java\jre7\lib\jce.jar;C:\Program Files\Java\jre7\lib\charsets.jar;C:\Program Files\Java\jre7\lib\jfr.jar;C:\Program Files\Java\jre7\classes] not found. at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16) at scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17) at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48) at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) at scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72) at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119) at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21) at org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115) at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231) at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231) at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335) at scala.reflect.api.Universe.typeOf(Universe.scala:59) at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33) at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33) at org.apache.spark.sql.catalyst.ScalaReflection$class.attributesFor(ScalaReflection.scala:94) at org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:33) at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:111) at com.sar.spark.dq.poc.SparkPOC$delayedInit$body.apply(SparkPOC.scala:43) at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:71) at scala.App$$anonfun$main$1.apply(App.scala:71) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) at scala.App$class.main(App.scala:71) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
[ https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519515#comment-14519515 ] Littlestar commented on SPARK-6461: --- same to SPARK-7193 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos -- Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
[ https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar resolved SPARK-6461. --- Resolution: Duplicate missing enviroment on spark-1.3.1-hadoop2.4.0.tgz. MARK here: spark-env.sh on driver node only effects on spark-submit(driver node). spark-1.3.1-hadoop2.4.0.tgz/conf/spark-env.sh work well with worker node. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos -- Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release
[ https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516801#comment-14516801 ] Littlestar commented on SPARK-7193: --- {noformat} 15/04/28 18:45:53 INFO spark.SparkContext: Running Spark version 1.3.1 Spark context available as sc. 15/04/28 18:45:57 INFO repl.SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala val data = Array(1, 2, 3, 4, 5) data: Array[Int] = Array(1, 2, 3, 4, 5) scala val distData = sc.parallelize(data) distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at console:23 scala distData.reduce(_+_) --- org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent failure: Lost task 7.3 in stage 0.0 (TID 17, hpblade06): ExecutorLostFailure (executor 20150427-165835-1214949568-5050-6-S0 lost) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) {noformat} Spark on Mesos may need more tests for spark 1.3.1 release Key: SPARK-7193 URL: https://issues.apache.org/jira/browse/SPARK-7193 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.3.1 Reporter: Littlestar Spark on Mesos may need more tests for spark 1.3.1 release http://spark.apache.org/docs/latest/running-on-mesos.html I tested mesos 0.21.1/0.22.0/0.22.1 RC4. It just work well with ./bin/spark-shell --master mesos://host:5050. Any task need more than one nodes, it will throws the following exceptions. {noformat} Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 0.0 failed 4 times, most recent failure: Lost task 10.3 in stage 0.0 (TID 127, hpblade05): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2393) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1378) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at
[jira] [Comment Edited] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release
[ https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516801#comment-14516801 ] Littlestar edited comment on SPARK-7193 at 4/28/15 10:51 AM: - 1 master + 7 nodes (spark 1.3.1 + mesos 0.22.0/0.22.1) {noformat} 15/04/28 18:45:53 INFO spark.SparkContext: Running Spark version 1.3.1 Spark context available as sc. 15/04/28 18:45:57 INFO repl.SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala val data = Array(1, 2, 3, 4, 5) data: Array[Int] = Array(1, 2, 3, 4, 5) scala val distData = sc.parallelize(data) distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at console:23 scala distData.reduce(_+_) --- org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent failure: Lost task 7.3 in stage 0.0 (TID 17, hpblade06): ExecutorLostFailure (executor 20150427-165835-1214949568-5050-6-S0 lost) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) {noformat} was (Author: cnstar9988): {noformat} 15/04/28 18:45:53 INFO spark.SparkContext: Running Spark version 1.3.1 Spark context available as sc. 15/04/28 18:45:57 INFO repl.SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala val data = Array(1, 2, 3, 4, 5) data: Array[Int] = Array(1, 2, 3, 4, 5) scala val distData = sc.parallelize(data) distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at console:23 scala distData.reduce(_+_) --- org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent failure: Lost task 7.3 in stage 0.0 (TID 17, hpblade06): ExecutorLostFailure (executor 20150427-165835-1214949568-5050-6-S0 lost) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) {noformat} Spark on Mesos may need more tests for spark 1.3.1 release Key: SPARK-7193 URL: https://issues.apache.org/jira/browse/SPARK-7193 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.3.1 Reporter: Littlestar Spark on Mesos may need more tests for spark 1.3.1 release http://spark.apache.org/docs/latest/running-on-mesos.html I tested mesos
[jira] [Comment Edited] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release
[ https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516801#comment-14516801 ] Littlestar edited comment on SPARK-7193 at 4/28/15 10:53 AM: - 1 master + 7 nodes (spark 1.3.1 + mesos 0.22.0/0.22.1) {noformat} ./spark-shell --master mesos://hpblade02:5050 15/04/28 18:45:53 INFO spark.SparkContext: Running Spark version 1.3.1 Spark context available as sc. 15/04/28 18:45:57 INFO repl.SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala val data = Array(1, 2, 3, 4, 5) data: Array[Int] = Array(1, 2, 3, 4, 5) scala val distData = sc.parallelize(data) distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at console:23 scala distData.reduce(_+_) --- org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent failure: Lost task 7.3 in stage 0.0 (TID 17, hpblade06): ExecutorLostFailure (executor 20150427-165835-1214949568-5050-6-S0 lost) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) {noformat} was (Author: cnstar9988): 1 master + 7 nodes (spark 1.3.1 + mesos 0.22.0/0.22.1) {noformat} 15/04/28 18:45:53 INFO spark.SparkContext: Running Spark version 1.3.1 Spark context available as sc. 15/04/28 18:45:57 INFO repl.SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala val data = Array(1, 2, 3, 4, 5) data: Array[Int] = Array(1, 2, 3, 4, 5) scala val distData = sc.parallelize(data) distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at console:23 scala distData.reduce(_+_) --- org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent failure: Lost task 7.3 in stage 0.0 (TID 17, hpblade06): ExecutorLostFailure (executor 20150427-165835-1214949568-5050-6-S0 lost) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) {noformat} Spark on Mesos may need more tests for spark 1.3.1 release Key: SPARK-7193 URL: https://issues.apache.org/jira/browse/SPARK-7193 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.3.1 Reporter: Littlestar Spark on Mesos may need more tests for spark 1.3.1
[jira] [Commented] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release
[ https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516807#comment-14516807 ] Littlestar commented on SPARK-7193: --- exception on some mesos worknode log. {noformat} Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/executor/MesosExecutorBackend Caused by: java.lang.ClassNotFoundException: org.apache.spark.executor.MesosExecutorBackend at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: org.apache.spark.executor.MesosExecutorBackend. Program will exit. {noformat} Spark on Mesos may need more tests for spark 1.3.1 release Key: SPARK-7193 URL: https://issues.apache.org/jira/browse/SPARK-7193 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.3.1 Reporter: Littlestar Spark on Mesos may need more tests for spark 1.3.1 release http://spark.apache.org/docs/latest/running-on-mesos.html I tested mesos 0.21.1/0.22.0/0.22.1 RC4. It just work well with ./bin/spark-shell --master mesos://host:5050. Any task need more than one nodes, it will throws the following exceptions. {noformat} Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 0.0 failed 4 times, most recent failure: Lost task 10.3 in stage 0.0 (TID 127, hpblade05): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2393) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1378) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/04/28 15:33:18 ERROR scheduler.LiveListenerBus: Listener EventLoggingListener threw an exception java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at
[jira] [Resolved] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release
[ https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar resolved SPARK-7193. --- Resolution: Invalid I think official document missing some notes about Spark on Mesos I worked well with following: extract spark-1.3.1-bin-hadoop2.4.tgz, and modify conf\spark-env.sh and repack with new spark-1.3.1-bin-hadoop2.4.tgz, and then put to hdfs spark-env.sh set JAVA_HOME, HADOO_CONF_DIR, HADOO_HOME Spark on Mesos may need more tests for spark 1.3.1 release Key: SPARK-7193 URL: https://issues.apache.org/jira/browse/SPARK-7193 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.3.1 Reporter: Littlestar Spark on Mesos may need more tests for spark 1.3.1 release http://spark.apache.org/docs/latest/running-on-mesos.html I tested mesos 0.21.1/0.22.0/0.22.1 RC4. It just work well with ./bin/spark-shell --master mesos://host:5050. Any task need more than one nodes, it will throws the following exceptions. {noformat} Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 0.0 failed 4 times, most recent failure: Lost task 10.3 in stage 0.0 (TID 127, hpblade05): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2393) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1378) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/04/28 15:33:18 ERROR scheduler.LiveListenerBus: Listener EventLoggingListener threw an exception java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener.onStageCompleted(EventLoggingListener.scala:165) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:32) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at
[jira] [Comment Edited] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release
[ https://issues.apache.org/jira/browse/SPARK-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518610#comment-14518610 ] Littlestar edited comment on SPARK-7193 at 4/29/15 2:40 AM: I think official document missing some notes about Spark on Mesos I worked well with following: extract spark-1.3.1-bin-hadoop2.4.tgz, and modify conf\spark-env.sh and repack with new spark-1.3.1-bin-hadoop2.4.tgz, and then put to hdfs spark-env.sh set JAVA_HOME, HADOOP_CONF_DIR, HADOOP_HOME was (Author: cnstar9988): I think official document missing some notes about Spark on Mesos I worked well with following: extract spark-1.3.1-bin-hadoop2.4.tgz, and modify conf\spark-env.sh and repack with new spark-1.3.1-bin-hadoop2.4.tgz, and then put to hdfs spark-env.sh set JAVA_HOME, HADOO_CONF_DIR, HADOO_HOME Spark on Mesos may need more tests for spark 1.3.1 release Key: SPARK-7193 URL: https://issues.apache.org/jira/browse/SPARK-7193 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.3.1 Reporter: Littlestar Spark on Mesos may need more tests for spark 1.3.1 release http://spark.apache.org/docs/latest/running-on-mesos.html I tested mesos 0.21.1/0.22.0/0.22.1 RC4. It just work well with ./bin/spark-shell --master mesos://host:5050. Any task need more than one nodes, it will throws the following exceptions. {noformat} Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 0.0 failed 4 times, most recent failure: Lost task 10.3 in stage 0.0 (TID 127, hpblade05): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2393) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1378) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/04/28 15:33:18 ERROR scheduler.LiveListenerBus: Listener EventLoggingListener threw an exception java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at scala.Option.foreach(Option.scala:236) at
[jira] [Created] (SPARK-7193) Spark on Mesos may need more tests for spark 1.3.1 release
Littlestar created SPARK-7193: - Summary: Spark on Mesos may need more tests for spark 1.3.1 release Key: SPARK-7193 URL: https://issues.apache.org/jira/browse/SPARK-7193 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.3.1 Reporter: Littlestar Spark on Mesos may need more tests for spark 1.3.1 release http://spark.apache.org/docs/latest/running-on-mesos.html I tested mesos 0.21.1/0.22.0/0.22.1 RC4. It just work well with ./bin/spark-shell --master mesos://host:5050. Any task need more than one nodes, it will throws the following exceptions. {noformat} Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 0.0 failed 4 times, most recent failure: Lost task 10.3 in stage 0.0 (TID 127, hpblade05): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2393) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1378) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1963) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1887) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/04/28 15:33:18 ERROR scheduler.LiveListenerBus: Listener EventLoggingListener threw an exception java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener.onStageCompleted(EventLoggingListener.scala:165) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:32) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at
[jira] [Comment Edited] (SPARK-6151) schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size
[ https://issues.apache.org/jira/browse/SPARK-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491881#comment-14491881 ] Littlestar edited comment on SPARK-6151 at 4/14/15 1:04 AM: The HDFS Block Size is set once when you first install Hadoop. blockSize can be changed when File create, but spark has no way to change blockSize. FSDataOutputStream org.apache.hadoop.fs.FileSystem.create(Path f, boolean overwrite, int bufferSize, short replication, long blockSize) throws IOException was (Author: cnstar9988): The HDFS Block Size is set once when you first install Hadoop. blockSize can be changed when File create. FSDataOutputStream org.apache.hadoop.fs.FileSystem.create(Path f, boolean overwrite, int bufferSize, short replication, long blockSize) throws IOException schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size --- Key: SPARK-6151 URL: https://issues.apache.org/jira/browse/SPARK-6151 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: Littlestar Priority: Trivial How schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size. may be Configuration need. related question by others. http://apache-spark-user-list.1001560.n3.nabble.com/HDFS-block-size-for-parquet-output-tt21183.html http://qnalist.com/questions/5054892/spark-sql-parquet-and-impala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6151) schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size
[ https://issues.apache.org/jira/browse/SPARK-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491881#comment-14491881 ] Littlestar commented on SPARK-6151: --- The HDFS Block Size is set once when you first install Hadoop. blockSize can be changed when File create. FSDataOutputStream org.apache.hadoop.fs.FileSystem.create(Path f, boolean overwrite, int bufferSize, short replication, long blockSize) throws IOException schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size --- Key: SPARK-6151 URL: https://issues.apache.org/jira/browse/SPARK-6151 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: Littlestar Priority: Trivial How schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size. may be Configuration need. related question by others. http://apache-spark-user-list.1001560.n3.nabble.com/HDFS-block-size-for-parquet-output-tt21183.html http://qnalist.com/questions/5054892/spark-sql-parquet-and-impala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6678) select count(DISTINCT C_UID) from parquetdir may be can optimize
Littlestar created SPARK-6678: - Summary: select count(DISTINCT C_UID) from parquetdir may be can optimize Key: SPARK-6678 URL: https://issues.apache.org/jira/browse/SPARK-6678 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor 2.2T parquet files(5000 files total, 100 billion records, 2 billion unique C_UID). I run the following sql, may be RDD.collect is very slow select count(DISTINCT C_UID) from parquetdir select count(DISTINCT C_UID) from parquetdir collect at SparkPlan.scala:83 +details org.apache.spark.rdd.RDD.collect(RDD.scala:813) org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:83) org.apache.spark.sql.DataFrame.collect(DataFrame.scala:815) org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178) org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231) org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79) org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37) org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:415) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493) org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60) com.sun.proxy.$Proxy23.executeStatementAsync(Unknown Source) org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use long instead
[ https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387748#comment-14387748 ] Littlestar commented on SPARK-6239: --- If I want to set minCount=2, I must use.setMinSupport(1.99/(rdd.count())), because of double's precision. How to reopen this PR and mark relation to pull/5246, thanks. Spark MLlib fpm#FPGrowth minSupport should use long instead --- Key: SPARK-6239 URL: https://issues.apache.org/jira/browse/SPARK-6239 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor Spark MLlib fpm#FPGrowth minSupport should use long instead == val minCount = math.ceil(minSupport * count).toLong because: 1. [count]numbers of datasets is not kown before read. 2. [minSupport ]double precision. from mahout#FPGrowthDriver.java addOption(minSupport, s, (Optional) The minimum number of times a co-occurrence must be present. + Default Value: 3, 3); I just want to set minCount=2 for test. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use long instead
[ https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387748#comment-14387748 ] Littlestar edited comment on SPARK-6239 at 3/31/15 1:09 AM: I would imagine a relative value is more usually useful. when recnum=12345678, minsupport=0.003, recnum*minsupport near to integer. Some result with little difference is lost because of double precision. was (Author: cnstar9988): If I want to set minCount=2, I must use.setMinSupport(1.99/(rdd.count())), because of double's precision. How to reopen this PR and mark relation to pull/5246, thanks. Spark MLlib fpm#FPGrowth minSupport should use long instead --- Key: SPARK-6239 URL: https://issues.apache.org/jira/browse/SPARK-6239 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor Spark MLlib fpm#FPGrowth minSupport should use long instead == val minCount = math.ceil(minSupport * count).toLong because: 1. [count]numbers of datasets is not kown before read. 2. [minSupport ]double precision. from mahout#FPGrowthDriver.java addOption(minSupport, s, (Optional) The minimum number of times a co-occurrence must be present. + Default Value: 3, 3); I just want to set minCount=2 for test. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-1702) Mesos executor won't start because of a ClassNotFoundException
[ https://issues.apache.org/jira/browse/SPARK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375599#comment-14375599 ] Littlestar edited comment on SPARK-1702 at 3/23/15 11:00 AM: - I met this on spak 1.3.0 + mesos 0.21.1 with run-example SparkPi Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/executor/MesosExecutorBackend Caused by: java.lang.ClassNotFoundException: org.apache.spark.executor.MesosExecutorBackend at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: org.apache.spark.executor.MesosExecutorBackend was (Author: cnstar9988): I met this on spak 1.3.0 + mesos 0.21.1 with run-example SparkPi Mesos executor won't start because of a ClassNotFoundException -- Key: SPARK-1702 URL: https://issues.apache.org/jira/browse/SPARK-1702 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Reporter: Bouke van der Bijl Labels: executors, mesos, spark Some discussion here: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html Fix here (which is probably not the right fix): https://github.com/apache/spark/pull/620 This was broken in v0.9.0, was fixed in v0.9.1 and is now broken again. Error in Mesos executor stderr: WARNING: Logging before InitGoogleLogging() is written to STDERR I0502 17:31:42.672224 14688 exec.cpp:131] Version: 0.18.0 I0502 17:31:42.674959 14707 exec.cpp:205] Executor registered on slave 20140501-182306-16842879-5050-10155-0 14/05/02 17:31:42 INFO MesosExecutorBackend: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/05/02 17:31:42 INFO MesosExecutorBackend: Registered with Mesos as executor ID 20140501-182306-16842879-5050-10155-0 14/05/02 17:31:43 INFO SecurityManager: Changing view acls to: vagrant 14/05/02 17:31:43 INFO SecurityManager: SecurityManager, is authentication enabled: false are ui acls enabled: false users with view permissions: Set(vagrant) 14/05/02 17:31:43 INFO Slf4jLogger: Slf4jLogger started 14/05/02 17:31:43 INFO Remoting: Starting remoting 14/05/02 17:31:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@localhost:50843] 14/05/02 17:31:43 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@localhost:50843] java.lang.ClassNotFoundException: org/apache/spark/serializer/JavaSerializer at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:165) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:176) at org.apache.spark.executor.Executor.init(Executor.scala:106) at org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:56) Exception in thread Thread-0 I0502 17:31:43.710039 14707 exec.cpp:412] Deactivating the executor libprocess The problem is that it can't find the class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
[ https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375580#comment-14375580 ] Littlestar edited comment on SPARK-6461 at 3/23/15 9:04 AM: when I add MESOS_HADOOP_CONF_DIR at all mesos-master-env.sh and mesos-slave-env.sh , It throws the following error. Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/executor/MesosExecutorBackend Caused by: java.lang.ClassNotFoundException: org.apache.spark.executor.MesosExecutorBackend similar to https://issues.apache.org/jira/browse/SPARK-1702 was (Author: cnstar9988): when I add MESOS_HADOOP_CONF_DIR at all mesos-master-env.sh and mesos-slave-env.sh , It throws the following error. Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/executor/MesosExecutorBackend Caused by: java.lang.ClassNotFoundException: org.apache.spark.executor.MesosExecutorBackend similar to https://github.com/apache/spark/pull/620 spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos -- Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1480) Choose classloader consistently inside of Spark codebase
[ https://issues.apache.org/jira/browse/SPARK-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375572#comment-14375572 ] Littlestar commented on SPARK-1480: --- I meet this bug on spark 1.3.0 + mesos 0.21.1 100%.. I0323 16:32:18.933440 14504 fetcher.cpp:64] Extracted resource '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S4/frameworks/20150323-152848-1214949568-5050-21134-0009/executors/20150323-100710-1214949568-5050-3453-S4/runs/3d8f22f5-7fed-44ed-b5f9-98a219133911/spark-1.3.0-bin-2.4.0.tar.gz' into '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S4/frameworks/20150323-152848-1214949568-5050-21134-0009/executors/20150323-100710-1214949568-5050-3453-S4/runs/3d8f22f5-7fed-44ed-b5f9-98a219133911' Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/executor/MesosExecutorBackend Caused by: java.lang.ClassNotFoundException: org.apache.spark.executor.MesosExecutorBackend at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: org.apache.spark.executor.MesosExecutorBackend Choose classloader consistently inside of Spark codebase Key: SPARK-1480 URL: https://issues.apache.org/jira/browse/SPARK-1480 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.0.0 The Spark codebase is not always consistent on which class loader it uses when classlaoders are explicitly passed to things like serializers. This caused SPARK-1403 and also causes a bug where when the driver has a modified context class loader it is not translated correctly in local mode to the (local) executor. In most cases what we want is the following behavior: 1. If there is a context classloader on the thread, use that. 2. Otherwise use the classloader that loaded Spark. We should just have a utility function for this and call that function whenever we need to get a classloader. Note that SPARK-1403 is a workaround for this exact problem (it sets the context class loader because downstream code assumes it is set). Once this gets fixed in a more general way SPARK-1403 can be reverted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
Littlestar created SPARK-6461: - Summary: spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
[ https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375468#comment-14375468 ] Littlestar edited comment on SPARK-6461 at 3/23/15 8:39 AM: each mesos slave node has JAVA and HADOOP DataNode. I also add the following setting to mesos-master-env.sh and mesos-slave-env.sh. export MESOS_JAVA_HOME=/home/test/jdk export MESOS_HADOOP_HOME=/home/test/hadoop-2.4.0 export MESOS_HADOOP_CONF_DIR=/home/test/hadoop-2.4.0/etc/hadoop export MESOS_PATH=/home/test/jdk/bin:/home/test/hadoop-2.4.0/sbin:/home/test/hadoop-2.4.0/bin:/sbin:/bin:/usr/sbin:/usr/bin /usr/bin/env: bash: No such file or directory thanks. was (Author: cnstar9988): each mesos slave node has JAVA and HADOOP DataNode. I also add the following setting to mesos-master-env.sh and mesos-slave-env.sh. export MESOS_JAVA_HOME=/home/test/jdk export MESOS_HADOOP_HOME=/home/test/hadoop-2.4.0 export MESOS_PATH=/home/test/jdk/bin:/home/test/hadoop-2.4.0/sbin:/home/test/hadoop-2.4.0/bin:/sbin:/bin:/usr/sbin:/usr/bin /usr/bin/env: bash: No such file or directory thanks. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos -- Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
[ https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375580#comment-14375580 ] Littlestar edited comment on SPARK-6461 at 3/23/15 8:49 AM: when I add MESOS_HADOOP_CONF_DIR at all mesos-master-env.sh and mesos-slave-env.sh , It throws the following error. Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/executor/MesosExecutorBackend Caused by: java.lang.ClassNotFoundException: org.apache.spark.executor.MesosExecutorBackend similar to https://github.com/apache/spark/pull/620 was (Author: cnstar9988): when I add MESOS_HADOOP_CONF_DIR at all mesos-master-env.sh and mesos-slave-env.sh , It throws the following error. Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/executor/MesosExecutorBackend Caused by: java.lang.ClassNotFoundException: org.apache.spark.executor.MesosExecutorBackend spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos -- Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1702) Mesos executor won't start because of a ClassNotFoundException
[ https://issues.apache.org/jira/browse/SPARK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375599#comment-14375599 ] Littlestar commented on SPARK-1702: --- I met this on spak 1.3.0 + mesos 0.21.1 Mesos executor won't start because of a ClassNotFoundException -- Key: SPARK-1702 URL: https://issues.apache.org/jira/browse/SPARK-1702 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Reporter: Bouke van der Bijl Labels: executors, mesos, spark Some discussion here: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html Fix here (which is probably not the right fix): https://github.com/apache/spark/pull/620 This was broken in v0.9.0, was fixed in v0.9.1 and is now broken again. Error in Mesos executor stderr: WARNING: Logging before InitGoogleLogging() is written to STDERR I0502 17:31:42.672224 14688 exec.cpp:131] Version: 0.18.0 I0502 17:31:42.674959 14707 exec.cpp:205] Executor registered on slave 20140501-182306-16842879-5050-10155-0 14/05/02 17:31:42 INFO MesosExecutorBackend: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/05/02 17:31:42 INFO MesosExecutorBackend: Registered with Mesos as executor ID 20140501-182306-16842879-5050-10155-0 14/05/02 17:31:43 INFO SecurityManager: Changing view acls to: vagrant 14/05/02 17:31:43 INFO SecurityManager: SecurityManager, is authentication enabled: false are ui acls enabled: false users with view permissions: Set(vagrant) 14/05/02 17:31:43 INFO Slf4jLogger: Slf4jLogger started 14/05/02 17:31:43 INFO Remoting: Starting remoting 14/05/02 17:31:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@localhost:50843] 14/05/02 17:31:43 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@localhost:50843] java.lang.ClassNotFoundException: org/apache/spark/serializer/JavaSerializer at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:165) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:176) at org.apache.spark.executor.Executor.init(Executor.scala:106) at org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:56) Exception in thread Thread-0 I0502 17:31:43.710039 14707 exec.cpp:412] Deactivating the executor libprocess The problem is that it can't find the class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-1702) Mesos executor won't start because of a ClassNotFoundException
[ https://issues.apache.org/jira/browse/SPARK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375599#comment-14375599 ] Littlestar edited comment on SPARK-1702 at 3/23/15 9:05 AM: I met this on spak 1.3.0 + mesos 0.21.1 with run-example SparkPi was (Author: cnstar9988): I met this on spak 1.3.0 + mesos 0.21.1 Mesos executor won't start because of a ClassNotFoundException -- Key: SPARK-1702 URL: https://issues.apache.org/jira/browse/SPARK-1702 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Reporter: Bouke van der Bijl Labels: executors, mesos, spark Some discussion here: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html Fix here (which is probably not the right fix): https://github.com/apache/spark/pull/620 This was broken in v0.9.0, was fixed in v0.9.1 and is now broken again. Error in Mesos executor stderr: WARNING: Logging before InitGoogleLogging() is written to STDERR I0502 17:31:42.672224 14688 exec.cpp:131] Version: 0.18.0 I0502 17:31:42.674959 14707 exec.cpp:205] Executor registered on slave 20140501-182306-16842879-5050-10155-0 14/05/02 17:31:42 INFO MesosExecutorBackend: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/05/02 17:31:42 INFO MesosExecutorBackend: Registered with Mesos as executor ID 20140501-182306-16842879-5050-10155-0 14/05/02 17:31:43 INFO SecurityManager: Changing view acls to: vagrant 14/05/02 17:31:43 INFO SecurityManager: SecurityManager, is authentication enabled: false are ui acls enabled: false users with view permissions: Set(vagrant) 14/05/02 17:31:43 INFO Slf4jLogger: Slf4jLogger started 14/05/02 17:31:43 INFO Remoting: Starting remoting 14/05/02 17:31:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@localhost:50843] 14/05/02 17:31:43 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@localhost:50843] java.lang.ClassNotFoundException: org/apache/spark/serializer/JavaSerializer at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:165) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:176) at org.apache.spark.executor.Executor.init(Executor.scala:106) at org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:56) Exception in thread Thread-0 I0502 17:31:43.710039 14707 exec.cpp:412] Deactivating the executor libprocess The problem is that it can't find the class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1480) Choose classloader consistently inside of Spark codebase
[ https://issues.apache.org/jira/browse/SPARK-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375576#comment-14375576 ] Littlestar commented on SPARK-1480: --- same as https://issues.apache.org/jira/browse/SPARK-6461 run-example SparkPi can reproduce this bug. Choose classloader consistently inside of Spark codebase Key: SPARK-1480 URL: https://issues.apache.org/jira/browse/SPARK-1480 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.0.0 The Spark codebase is not always consistent on which class loader it uses when classlaoders are explicitly passed to things like serializers. This caused SPARK-1403 and also causes a bug where when the driver has a modified context class loader it is not translated correctly in local mode to the (local) executor. In most cases what we want is the following behavior: 1. If there is a context classloader on the thread, use that. 2. Otherwise use the classloader that loaded Spark. We should just have a utility function for this and call that function whenever we need to get a classloader. Note that SPARK-1403 is a workaround for this exact problem (it sets the context class loader because downstream code assumes it is set). Once this gets fixed in a more general way SPARK-1403 can be reverted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
[ https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375468#comment-14375468 ] Littlestar commented on SPARK-6461: --- each mesos slave node has JAVA and HADOOP DataNode. I also add the following setting to mesos-master-env.sh and mesos-slave-env.sh. export MESOS_JAVA_HOME=/home/test/jdk export MESOS_HADOOP_HOME=/home/test/hadoop-2.4.0 export MESOS_PATH=/home/test/jdk/bin:/home/test/hadoop-2.4.0/sbin:/home/test/hadoop-2.4.0/bin:/sbin:/bin:/usr/sbin:/usr/bin /usr/bin/env: bash: No such file or directory thanks. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos -- Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1403) Spark on Mesos does not set Thread's context class loader
[ https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375583#comment-14375583 ] Littlestar commented on SPARK-1403: --- I want to reopen this bug, because I can reproduce it at spark 1.3.0 + mesos 0.21.1 with run-example SparkPi Spark on Mesos does not set Thread's context class loader - Key: SPARK-1403 URL: https://issues.apache.org/jira/browse/SPARK-1403 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Environment: ubuntu 12.04 on vagrant Reporter: Bharath Bhushan Priority: Blocker Fix For: 1.0.0 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark executor on mesos slave throws a java.lang.ClassNotFoundException for org.apache.spark.serializer.JavaSerializer. The lengthy discussion is here: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
[ https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375580#comment-14375580 ] Littlestar commented on SPARK-6461: --- when I add MESOS_HADOOP_CONF_DIR at all mesos-master-env.sh and mesos-slave-env.sh , It throws the following error. Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/executor/MesosExecutorBackend Caused by: java.lang.ClassNotFoundException: org.apache.spark.executor.MesosExecutorBackend spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos -- Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
[ https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375468#comment-14375468 ] Littlestar edited comment on SPARK-6461 at 3/23/15 9:29 AM: each mesos slave node has JAVA and HADOOP DataNode. Now I add the following setting to mesos-master-env.sh and mesos-slave-env.sh. export MESOS_JAVA_HOME=/home/test/jdk export MESOS_HADOOP_HOME=/home/test/hadoop-2.4.0 export MESOS_HADOOP_CONF_DIR=/home/test/hadoop-2.4.0/etc/hadoop export MESOS_PATH=/home/test/jdk/bin:/home/test/hadoop-2.4.0/sbin:/home/test/hadoop-2.4.0/bin:/sbin:/bin:/usr/sbin:/usr/bin /usr/bin/env: bash: No such file or directory thanks. was (Author: cnstar9988): each mesos slave node has JAVA and HADOOP DataNode. I also add the following setting to mesos-master-env.sh and mesos-slave-env.sh. export MESOS_JAVA_HOME=/home/test/jdk export MESOS_HADOOP_HOME=/home/test/hadoop-2.4.0 export MESOS_HADOOP_CONF_DIR=/home/test/hadoop-2.4.0/etc/hadoop export MESOS_PATH=/home/test/jdk/bin:/home/test/hadoop-2.4.0/sbin:/home/test/hadoop-2.4.0/bin:/sbin:/bin:/usr/sbin:/usr/bin /usr/bin/env: bash: No such file or directory thanks. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos -- Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
[ https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375478#comment-14375478 ] Littlestar commented on SPARK-6461: --- in spark/bin, some shell script use usr/bin/env bash I think changed #!/usr/bin/env bash to #!/bin/bash and that worked. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos -- Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
[ https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375508#comment-14375508 ] Littlestar commented on SPARK-6461: --- http://spark.apache.org/docs/latest/running-on-mesos.html I run ok with ./bin/spark-shell --master mesos://host:5050 but failed with run-example SparkPi does spark 1.3.0 tested SparkPi on mesos, thanks spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos -- Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java
[ https://issues.apache.org/jira/browse/SPARK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375612#comment-14375612 ] Littlestar commented on SPARK-6213: --- may be change protected[sql] def selectFilters(filters: Seq[Expression]) to static public java class is a easy way sql.catalyst.expressions.Expression is not friendly to java --- Key: SPARK-6213 URL: https://issues.apache.org/jira/browse/SPARK-6213 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor sql.sources.CatalystScan# public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression filters) I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is not friendly to java, it's can't iterated by java, such as NodeName, NodeType, FuncName, FuncArgs. DataSourceStrategy.scala#selectFilters {noformat} /** * Selects Catalyst predicate [[Expression]]s which are convertible into data source [[Filter]]s, * and convert them. */ protected[sql] def selectFilters(filters: Seq[Expression]) = { def translate(predicate: Expression): Option[Filter] = predicate match { case expressions.EqualTo(a: Attribute, Literal(v, _)) = Some(sources.EqualTo(a.name, v)) case expressions.EqualTo(Literal(v, _), a: Attribute) = Some(sources.EqualTo(a.name, v)) case expressions.GreaterThan(a: Attribute, Literal(v, _)) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThan(Literal(v, _), a: Attribute) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(a: Attribute, Literal(v, _)) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(Literal(v, _), a: Attribute) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.InSet(a: Attribute, set) = Some(sources.In(a.name, set.toArray)) case expressions.IsNull(a: Attribute) = Some(sources.IsNull(a.name)) case expressions.IsNotNull(a: Attribute) = Some(sources.IsNotNull(a.name)) case expressions.And(left, right) = (translate(left) ++ translate(right)).reduceOption(sources.And) case expressions.Or(left, right) = for { leftFilter - translate(left) rightFilter - translate(right) } yield sources.Or(leftFilter, rightFilter) case expressions.Not(child) = translate(child).map(sources.Not) case _ = None } filters.flatMap(translate) } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-6461) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos
[ https://issues.apache.org/jira/browse/SPARK-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-6461: -- Comment: was deleted (was: in spark/bin, some shell script use usr/bin/env bash I think changed #!/usr/bin/env bash to #!/bin/bash and that worked.) spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos -- Key: SPARK-6461 URL: https://issues.apache.org/jira/browse/SPARK-6461 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Littlestar I use mesos run spak 1.3.0 ./run-example SparkPi but failed. spark.executorEnv.PATH in spark-defaults.conf is not pass to mesos spark.executorEnv.PATH spark.executorEnv.HADOOP_HOME spark.executorEnv.JAVA_HOME E0323 14:24:36.400635 11355 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S3/frameworks/20150323-133400-1214949568-5050-15440-0007/executors/20150323-100710-1214949568-5050-3453-S3/runs/915b40d8-f7c4-428a-9df8-ac9804c6cd21/spark-1.3.0-bin-2.4.0.tar.gz' sh: hadoop: command not found -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6240) Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may outOfMemory for Large Sets
[ https://issues.apache.org/jira/browse/SPARK-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354890#comment-14354890 ] Littlestar commented on SPARK-6240: --- ok, I kown, Thanks. I just only notice that Array[Item] is not a distributed structure. Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may outOfMemory for Large Sets Key: SPARK-6240 URL: https://issues.apache.org/jira/browse/SPARK-6240 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may outOfMemory for Large Sets {noformat} private def genFreqItems[Item: ClassTag]( data: RDD[Array[Item]], minCount: Long, partitioner: Partitioner): Array[Item] = { data.flatMap { t = val uniq = t.toSet if (t.size != uniq.size) { throw new SparkException(sItems in a transaction must be unique but got ${t.toSeq}.) } t }.map(v = (v, 1L)) .reduceByKey(partitioner, _ + _) .filter(_._2 = minCount) .collect() .sortBy(-_._2) .map(_._1) } {noformat} I use 10*1*1 records for test, for output all simultaneously pair. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use long instead
[ https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354829#comment-14354829 ] Littlestar commented on SPARK-6239: --- When use FPGrowthModel, the numbers of input records is unkown before read. I think change the meaning of minSupport, or add setMinCount If I want to set minCount=2, I must use.setMinSupport(1.99/(rdd.count())), because of double's precision. val minCount = math.ceil(minSupport * count).toLong math.ceil Spark MLlib fpm#FPGrowth minSupport should use long instead --- Key: SPARK-6239 URL: https://issues.apache.org/jira/browse/SPARK-6239 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor Spark MLlib fpm#FPGrowth minSupport should use long instead == val minCount = math.ceil(minSupport * count).toLong because: 1. [count]numbers of datasets is not kown before read. 2. [minSupport ]double precision. from mahout#FPGrowthDriver.java addOption(minSupport, s, (Optional) The minimum number of times a co-occurrence must be present. + Default Value: 3, 3); I just want to set minCount=2 for test. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use long instead
[ https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-6239: -- Description: Spark MLlib fpm#FPGrowth minSupport should use long instead == val minCount = math.ceil(minSupport * count).toLong because: 1. [count]numbers of datasets is not kown before read. 2. [minSupport ]double precision. from mahout#FPGrowthDriver.java addOption(minSupport, s, (Optional) The minimum number of times a co-occurrence must be present. + Default Value: 3, 3); I just want to set minCount=2 for test. Thanks. was: Spark MLlib fpm#FPGrowth minSupport should use double instead == val minCount = math.ceil(minSupport * count).toLong because: 1. [count]numbers of datasets is not kown before read. 2. [minSupport ]double precision. from mahout#FPGrowthDriver.java addOption(minSupport, s, (Optional) The minimum number of times a co-occurrence must be present. + Default Value: 3, 3); I just want to set minCount=2 for test. Thanks. Summary: Spark MLlib fpm#FPGrowth minSupport should use long instead (was: Spark MLlib fpm#FPGrowth minSupport should use double instead) Spark MLlib fpm#FPGrowth minSupport should use long instead --- Key: SPARK-6239 URL: https://issues.apache.org/jira/browse/SPARK-6239 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor Spark MLlib fpm#FPGrowth minSupport should use long instead == val minCount = math.ceil(minSupport * count).toLong because: 1. [count]numbers of datasets is not kown before read. 2. [minSupport ]double precision. from mahout#FPGrowthDriver.java addOption(minSupport, s, (Optional) The minimum number of times a co-occurrence must be present. + Default Value: 3, 3); I just want to set minCount=2 for test. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6240) Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may outOfMemory for Large Sets
Littlestar created SPARK-6240: - Summary: Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may outOfMemory for Large Sets Key: SPARK-6240 URL: https://issues.apache.org/jira/browse/SPARK-6240 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor Spark MLlib fpm#FPGrowth genFreqItems use Array[Item] may outOfMemory for Large Sets {noformat} private def genFreqItems[Item: ClassTag]( data: RDD[Array[Item]], minCount: Long, partitioner: Partitioner): Array[Item] = { data.flatMap { t = val uniq = t.toSet if (t.size != uniq.size) { throw new SparkException(sItems in a transaction must be unique but got ${t.toSeq}.) } t }.map(v = (v, 1L)) .reduceByKey(partitioner, _ + _) .filter(_._2 = minCount) .collect() .sortBy(-_._2) .map(_._1) } {noformat} I use 10*1*1 records for test, for output all simultaneously pair. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use long instead
[ https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354450#comment-14354450 ] Littlestar commented on SPARK-6239: --- I just want to set minCount=2 for test 10*1*1 records. FPGrowthModelString model = new FPGrowth() .setMinSupport(2/(10*1.0*1.0)) .setNumPartitions(500) .run(maps); I think use minCount is better than minSupport. Spark MLlib fpm#FPGrowth minSupport should use long instead --- Key: SPARK-6239 URL: https://issues.apache.org/jira/browse/SPARK-6239 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor Spark MLlib fpm#FPGrowth minSupport should use long instead == val minCount = math.ceil(minSupport * count).toLong because: 1. [count]numbers of datasets is not kown before read. 2. [minSupport ]double precision. from mahout#FPGrowthDriver.java addOption(minSupport, s, (Optional) The minimum number of times a co-occurrence must be present. + Default Value: 3, 3); I just want to set minCount=2 for test. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6239) Spark MLlib fpm#FPGrowth minSupport should use double instead
Littlestar created SPARK-6239: - Summary: Spark MLlib fpm#FPGrowth minSupport should use double instead Key: SPARK-6239 URL: https://issues.apache.org/jira/browse/SPARK-6239 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor Spark MLlib fpm#FPGrowth minSupport should use double instead == val minCount = math.ceil(minSupport * count).toLong because: 1. [count]numbers of datasets is not kown before read. 2. [minSupport ]double precision. from mahout#FPGrowthDriver.java addOption(minSupport, s, (Optional) The minimum number of times a co-occurrence must be present. + Default Value: 3, 3); I just want to set minCount=2 for test. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java
[ https://issues.apache.org/jira/browse/SPARK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351479#comment-14351479 ] Littlestar commented on SPARK-6213: --- above similar scala code is very hard to translate to java. sql.catalyst.expressions.Expression is not friendly to java --- Key: SPARK-6213 URL: https://issues.apache.org/jira/browse/SPARK-6213 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor sql.sources.CatalystScan# public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression filters) I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is not friendly to java, it's can't iterated by java, such as NodeName, NodeType. DataSourceStrategy.scala#selectFilters {noformat} /** * Selects Catalyst predicate [[Expression]]s which are convertible into data source [[Filter]]s, * and convert them. */ protected[sql] def selectFilters(filters: Seq[Expression]) = { def translate(predicate: Expression): Option[Filter] = predicate match { case expressions.EqualTo(a: Attribute, Literal(v, _)) = Some(sources.EqualTo(a.name, v)) case expressions.EqualTo(Literal(v, _), a: Attribute) = Some(sources.EqualTo(a.name, v)) case expressions.GreaterThan(a: Attribute, Literal(v, _)) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThan(Literal(v, _), a: Attribute) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(a: Attribute, Literal(v, _)) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(Literal(v, _), a: Attribute) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.InSet(a: Attribute, set) = Some(sources.In(a.name, set.toArray)) case expressions.IsNull(a: Attribute) = Some(sources.IsNull(a.name)) case expressions.IsNotNull(a: Attribute) = Some(sources.IsNotNull(a.name)) case expressions.And(left, right) = (translate(left) ++ translate(right)).reduceOption(sources.And) case expressions.Or(left, right) = for { leftFilter - translate(left) rightFilter - translate(right) } yield sources.Or(leftFilter, rightFilter) case expressions.Not(child) = translate(child).map(sources.Not) case _ = None } filters.flatMap(translate) } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java
[ https://issues.apache.org/jira/browse/SPARK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-6213: -- Description: sql.sources.CatalystScan# public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression filters) I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is not friendly to java, it's can't iterated by java, such as NodeName, NodeType, FunctionName, FuncArgs. DataSourceStrategy.scala#selectFilters {noformat} /** * Selects Catalyst predicate [[Expression]]s which are convertible into data source [[Filter]]s, * and convert them. */ protected[sql] def selectFilters(filters: Seq[Expression]) = { def translate(predicate: Expression): Option[Filter] = predicate match { case expressions.EqualTo(a: Attribute, Literal(v, _)) = Some(sources.EqualTo(a.name, v)) case expressions.EqualTo(Literal(v, _), a: Attribute) = Some(sources.EqualTo(a.name, v)) case expressions.GreaterThan(a: Attribute, Literal(v, _)) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThan(Literal(v, _), a: Attribute) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(a: Attribute, Literal(v, _)) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(Literal(v, _), a: Attribute) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.InSet(a: Attribute, set) = Some(sources.In(a.name, set.toArray)) case expressions.IsNull(a: Attribute) = Some(sources.IsNull(a.name)) case expressions.IsNotNull(a: Attribute) = Some(sources.IsNotNull(a.name)) case expressions.And(left, right) = (translate(left) ++ translate(right)).reduceOption(sources.And) case expressions.Or(left, right) = for { leftFilter - translate(left) rightFilter - translate(right) } yield sources.Or(leftFilter, rightFilter) case expressions.Not(child) = translate(child).map(sources.Not) case _ = None } filters.flatMap(translate) } {noformat} was: sql.sources.CatalystScan# public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression filters) I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is not friendly to java, it's can't iterated by java, such as NodeName, NodeType. DataSourceStrategy.scala#selectFilters {noformat} /** * Selects Catalyst predicate [[Expression]]s which are convertible into data source [[Filter]]s, * and convert them. */ protected[sql] def selectFilters(filters: Seq[Expression]) = { def translate(predicate: Expression): Option[Filter] = predicate match { case expressions.EqualTo(a: Attribute, Literal(v, _)) = Some(sources.EqualTo(a.name, v)) case expressions.EqualTo(Literal(v, _), a: Attribute) = Some(sources.EqualTo(a.name, v)) case expressions.GreaterThan(a: Attribute, Literal(v, _)) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThan(Literal(v, _), a: Attribute) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(a: Attribute, Literal(v, _)) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(Literal(v, _), a: Attribute) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.InSet(a: Attribute, set) = Some(sources.In(a.name, set.toArray)) case expressions.IsNull(a: Attribute) = Some(sources.IsNull(a.name)) case expressions.IsNotNull(a: Attribute) = Some(sources.IsNotNull(a.name)) case expressions.And(left, right) = (translate(left) ++ translate(right)).reduceOption(sources.And) case expressions.Or(left, right) = for { leftFilter - translate(left) rightFilter - translate(right) } yield sources.Or(leftFilter, rightFilter)
[jira] [Updated] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java
[ https://issues.apache.org/jira/browse/SPARK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-6213: -- Description: sql.sources.CatalystScan# public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression filters) I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is not friendly to java, it's can't iterated by java, such as NodeName, NodeType, FuncName, FuncArgs. DataSourceStrategy.scala#selectFilters {noformat} /** * Selects Catalyst predicate [[Expression]]s which are convertible into data source [[Filter]]s, * and convert them. */ protected[sql] def selectFilters(filters: Seq[Expression]) = { def translate(predicate: Expression): Option[Filter] = predicate match { case expressions.EqualTo(a: Attribute, Literal(v, _)) = Some(sources.EqualTo(a.name, v)) case expressions.EqualTo(Literal(v, _), a: Attribute) = Some(sources.EqualTo(a.name, v)) case expressions.GreaterThan(a: Attribute, Literal(v, _)) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThan(Literal(v, _), a: Attribute) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(a: Attribute, Literal(v, _)) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(Literal(v, _), a: Attribute) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.InSet(a: Attribute, set) = Some(sources.In(a.name, set.toArray)) case expressions.IsNull(a: Attribute) = Some(sources.IsNull(a.name)) case expressions.IsNotNull(a: Attribute) = Some(sources.IsNotNull(a.name)) case expressions.And(left, right) = (translate(left) ++ translate(right)).reduceOption(sources.And) case expressions.Or(left, right) = for { leftFilter - translate(left) rightFilter - translate(right) } yield sources.Or(leftFilter, rightFilter) case expressions.Not(child) = translate(child).map(sources.Not) case _ = None } filters.flatMap(translate) } {noformat} was: sql.sources.CatalystScan# public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression filters) I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is not friendly to java, it's can't iterated by java, such as NodeName, NodeType, FunctionName, FuncArgs. DataSourceStrategy.scala#selectFilters {noformat} /** * Selects Catalyst predicate [[Expression]]s which are convertible into data source [[Filter]]s, * and convert them. */ protected[sql] def selectFilters(filters: Seq[Expression]) = { def translate(predicate: Expression): Option[Filter] = predicate match { case expressions.EqualTo(a: Attribute, Literal(v, _)) = Some(sources.EqualTo(a.name, v)) case expressions.EqualTo(Literal(v, _), a: Attribute) = Some(sources.EqualTo(a.name, v)) case expressions.GreaterThan(a: Attribute, Literal(v, _)) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThan(Literal(v, _), a: Attribute) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(a: Attribute, Literal(v, _)) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(Literal(v, _), a: Attribute) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.InSet(a: Attribute, set) = Some(sources.In(a.name, set.toArray)) case expressions.IsNull(a: Attribute) = Some(sources.IsNull(a.name)) case expressions.IsNotNull(a: Attribute) = Some(sources.IsNotNull(a.name)) case expressions.And(left, right) = (translate(left) ++ translate(right)).reduceOption(sources.And) case expressions.Or(left, right) = for { leftFilter - translate(left) rightFilter - translate(right) } yield
[jira] [Created] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java
Littlestar created SPARK-6213: - Summary: sql.catalyst.expressions.Expression is not friendly to java Key: SPARK-6213 URL: https://issues.apache.org/jira/browse/SPARK-6213 Project: Spark Issue Type: Improvement Reporter: Littlestar Priority: Minor sql.sources.CatalystScan# public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression filters) I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is not friendly to java, it's can't iterated by java, such as NodeName, NodeType. DataSourceStrategy.scala#selectFilters {noformat} /** * Selects Catalyst predicate [[Expression]]s which are convertible into data source [[Filter]]s, * and convert them. */ protected[sql] def selectFilters(filters: Seq[Expression]) = { def translate(predicate: Expression): Option[Filter] = predicate match { case expressions.EqualTo(a: Attribute, Literal(v, _)) = Some(sources.EqualTo(a.name, v)) case expressions.EqualTo(Literal(v, _), a: Attribute) = Some(sources.EqualTo(a.name, v)) case expressions.GreaterThan(a: Attribute, Literal(v, _)) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThan(Literal(v, _), a: Attribute) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(a: Attribute, Literal(v, _)) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(Literal(v, _), a: Attribute) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.InSet(a: Attribute, set) = Some(sources.In(a.name, set.toArray)) case expressions.IsNull(a: Attribute) = Some(sources.IsNull(a.name)) case expressions.IsNotNull(a: Attribute) = Some(sources.IsNotNull(a.name)) case expressions.And(left, right) = (translate(left) ++ translate(right)).reduceOption(sources.And) case expressions.Or(left, right) = for { leftFilter - translate(left) rightFilter - translate(right) } yield sources.Or(leftFilter, rightFilter) case expressions.Not(child) = translate(child).map(sources.Not) case _ = None } filters.flatMap(translate) } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6213) sql.catalyst.expressions.Expression is not friendly to java
[ https://issues.apache.org/jira/browse/SPARK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-6213: -- Component/s: SQL Affects Version/s: 1.3.0 sql.catalyst.expressions.Expression is not friendly to java --- Key: SPARK-6213 URL: https://issues.apache.org/jira/browse/SPARK-6213 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor sql.sources.CatalystScan# public RDDRow buildScan(SeqAttribute requiredColumns, SeqExpression filters) I use java to extends BaseRelation, but sql.catalyst.expressions.Expression is not friendly to java, it's can't iterated by java, such as NodeName, NodeType. DataSourceStrategy.scala#selectFilters {noformat} /** * Selects Catalyst predicate [[Expression]]s which are convertible into data source [[Filter]]s, * and convert them. */ protected[sql] def selectFilters(filters: Seq[Expression]) = { def translate(predicate: Expression): Option[Filter] = predicate match { case expressions.EqualTo(a: Attribute, Literal(v, _)) = Some(sources.EqualTo(a.name, v)) case expressions.EqualTo(Literal(v, _), a: Attribute) = Some(sources.EqualTo(a.name, v)) case expressions.GreaterThan(a: Attribute, Literal(v, _)) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThan(Literal(v, _), a: Attribute) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(a: Attribute, Literal(v, _)) = Some(sources.LessThan(a.name, v)) case expressions.LessThan(Literal(v, _), a: Attribute) = Some(sources.GreaterThan(a.name, v)) case expressions.GreaterThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.GreaterThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(a: Attribute, Literal(v, _)) = Some(sources.LessThanOrEqual(a.name, v)) case expressions.LessThanOrEqual(Literal(v, _), a: Attribute) = Some(sources.GreaterThanOrEqual(a.name, v)) case expressions.InSet(a: Attribute, set) = Some(sources.In(a.name, set.toArray)) case expressions.IsNull(a: Attribute) = Some(sources.IsNull(a.name)) case expressions.IsNotNull(a: Attribute) = Some(sources.IsNotNull(a.name)) case expressions.And(left, right) = (translate(left) ++ translate(right)).reduceOption(sources.And) case expressions.Or(left, right) = for { leftFilter - translate(left) rightFilter - translate(right) } yield sources.Or(leftFilter, rightFilter) case expressions.Not(child) = translate(child).map(sources.Not) case _ = None } filters.flatMap(translate) } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods
[ https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344813#comment-14344813 ] Littlestar commented on SPARK-6049: --- I test 1.3.0 RC1, it fixed SPARK-3675 and SPARK-4865. It works for me. HiveThriftServer2 may expose Inheritable methods Key: SPARK-6049 URL: https://issues.apache.org/jira/browse/SPARK-6049 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Labels: HiveThriftServer2 Does HiveThriftServer2 may expose Inheritable methods? HiveThriftServer2 is very good when used as a JDBC Server, but HiveThriftServer2.scala is not Inheritable or invokable by app. My app use JavaSQLContext and registerTempTable. I want to expose these TempTables by HiveThriftServer2(JDBC Server). Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
[ https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346349#comment-14346349 ] Littlestar commented on SPARK-5834: --- Spark 1.3.0 RC1/RC2 bundle with httpclient 4.3.6, works ok. spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old -- Key: SPARK-5834 URL: https://issues.apache.org/jira/browse/SPARK-5834 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module requires httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6151) schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size
[ https://issues.apache.org/jira/browse/SPARK-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-6151: -- Component/s: SQL schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size --- Key: SPARK-6151 URL: https://issues.apache.org/jira/browse/SPARK-6151 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: Littlestar Priority: Trivial How schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size. may be Configuration need. related question by others. http://apache-spark-user-list.1001560.n3.nabble.com/HDFS-block-size-for-parquet-output-tt21183.html http://qnalist.com/questions/5054892/spark-sql-parquet-and-impala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6151) schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size
Littlestar created SPARK-6151: - Summary: schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size Key: SPARK-6151 URL: https://issues.apache.org/jira/browse/SPARK-6151 Project: Spark Issue Type: Improvement Affects Versions: 1.2.1 Reporter: Littlestar Priority: Trivial How schemaRDD to parquetfile with saveAsParquetFile control the HDFS block size. may be Configuration need. related question by others. http://apache-spark-user-list.1001560.n3.nabble.com/HDFS-block-size-for-parquet-output-tt21183.html http://qnalist.com/questions/5054892/spark-sql-parquet-and-impala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods
[ https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340018#comment-14340018 ] Littlestar edited comment on SPARK-6049 at 3/3/15 9:59 AM: --- thanks. I call ok by static methods, but not tables finds. Does I need to open new JIRA issue? {noformat} SparkConf sparkConf = new SparkConf().setAppName(ParquetFileQuery); JavaSparkContext ctx = new JavaSparkContext(sparkConf); SQLContext sqlCtx = new SQLContext(ctx); HiveContext hiveCtx = new HiveContext(ctx.sc()); JavaSchemaRDD mids = sqlCtx.parquetFile(/parquetdir); hiveCtx.registerDataFrameAsTable(mids, demo); HiveThriftServer2.startWithContext(hiveCtx); {noformat} was (Author: cnstar9988): thanks. I call ok by static methods, but not tables finds. Does I need to open new JIRA issue? {noformat} SparkConf sparkConf = new SparkConf().setAppName(ParquetFileQuery).set(spark.executor.memory, 4g); JavaSparkContext ctx = new JavaSparkContext(sparkConf); JavaSQLContext sqlCtx = new JavaSQLContext(ctx); JavaSchemaRDD mids = sqlCtx.parquetFile(/parquetdir); mids.registerTempTable(demo); HiveThriftServer2.startWithContext(new HiveContext(ctx.sc())); {noformat} HiveThriftServer2 may expose Inheritable methods Key: SPARK-6049 URL: https://issues.apache.org/jira/browse/SPARK-6049 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Labels: HiveThriftServer2 Does HiveThriftServer2 may expose Inheritable methods? HiveThriftServer2 is very good when used as a JDBC Server, but HiveThriftServer2.scala is not Inheritable or invokable by app. My app use JavaSQLContext and registerTempTable. I want to expose these TempTables by HiveThriftServer2(JDBC Server). Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java
[ https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-6131: -- Description: I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\*** Thanks. was: I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ Thanks. Spark 1.3.0 (RC1) missing some source files in sql.api.java --- Key: SPARK-6131 URL: https://issues.apache.org/jira/browse/SPARK-6131 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Littlestar Priority: Critical I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\*** Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java
[ https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-6131: -- Comment: was deleted (was: spark-1.2.1 is ok. spark-1.2.1-src.jar\sql\core\src\main\scala\org\apache\spark\sql\api\JavaSQLContext.scala) Spark 1.3.0 (RC1) missing some source files in sql.api.java --- Key: SPARK-6131 URL: https://issues.apache.org/jira/browse/SPARK-6131 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\*** Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java
[ https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344584#comment-14344584 ] Littlestar commented on SPARK-6131: --- spark-1.2.1 is ok. spark-1.2.1-src.jar\sql\core\src\main\scala\org\apache\spark\sql\api\JavaSQLContext.scala Spark 1.3.0 (RC1) missing some source files in sql.api.java --- Key: SPARK-6131 URL: https://issues.apache.org/jira/browse/SPARK-6131 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Littlestar Priority: Critical I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\*** Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java
[ https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-6131: -- Priority: Minor (was: Critical) Spark 1.3.0 (RC1) missing some source files in sql.api.java --- Key: SPARK-6131 URL: https://issues.apache.org/jira/browse/SPARK-6131 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Littlestar Priority: Minor I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\*** Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java
[ https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344604#comment-14344604 ] Littlestar commented on SPARK-6131: --- spark 1.2.1: sql\core\src\main\scala\org\apache\spark\sql\api\java\JavaSQLContext.scala spark 1.3.0 RC1 has no JavaSQLContext.scala. Spark 1.3.0 (RC1) missing some source files in sql.api.java --- Key: SPARK-6131 URL: https://issues.apache.org/jira/browse/SPARK-6131 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Littlestar Priority: Critical I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ spark-1.3.0\spark-1.3.0\sql\core\src\main\scala\org\apache\spark\sql\api\*** Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java
Littlestar created SPARK-6131: - Summary: Spark 1.3.0 (RC1) missing some source files in sql.api.java Key: SPARK-6131 URL: https://issues.apache.org/jira/browse/SPARK-6131 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Littlestar Priority: Critical I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz and build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: but JavaSQLContext.class appeared in spark-1.3.0-bin-hadoop2.4.tgz. I checked that spark-1.3.0 has no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java
[ https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-6131: -- Description: I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: JavaSQLContext.class appeared in spark-1.3.0-bin-hadoop2.4.tgz. But has no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ Thanks. was: I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz and build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: but JavaSQLContext.class appeared in spark-1.3.0-bin-hadoop2.4.tgz. I checked that spark-1.3.0 has no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ Thanks. Spark 1.3.0 (RC1) missing some source files in sql.api.java --- Key: SPARK-6131 URL: https://issues.apache.org/jira/browse/SPARK-6131 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Littlestar Priority: Critical I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: JavaSQLContext.class appeared in spark-1.3.0-bin-hadoop2.4.tgz. But has no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6131) Spark 1.3.0 (RC1) missing some source files in sql.api.java
[ https://issues.apache.org/jira/browse/SPARK-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-6131: -- Description: I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ Thanks. was: I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: JavaSQLContext.class appeared in spark-1.3.0-bin-hadoop2.4.tgz. But has no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ Thanks. Spark 1.3.0 (RC1) missing some source files in sql.api.java --- Key: SPARK-6131 URL: https://issues.apache.org/jira/browse/SPARK-6131 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Littlestar Priority: Critical I notice that [VOTE] Release Apache Spark 1.3.0 (RC1) http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-3-0-RC1-tt10658.html I download 1.3.0(RC1) from http://people.apache.org/~pwendell/spark-1.3.0-rc1/spark-1.3.0.tgz Just only for test, build myself, but thers is no org/apache/spark/sql/api/java/JavaSQLContext found. WARN: no JavaSQLContext.scala/java in spark-1.3.0.tgz. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java - Maybe some files missed in 1.3.0 RC1 source. spark-1.3.0\spark-1.3.0\sql\core\src\main\java\org\apache\spark\sql\api\java\ Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods
[ https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar resolved SPARK-6049. --- Resolution: Invalid very sorry, HiveThriftServer2.startWithContext can be called by java. I will open another JIRA issue for new problem. HiveThriftServer2 may expose Inheritable methods Key: SPARK-6049 URL: https://issues.apache.org/jira/browse/SPARK-6049 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Labels: HiveThriftServer2 Does HiveThriftServer2 may expose Inheritable methods? HiveThriftServer2 is very good when used as a JDBC Server, but HiveThriftServer2.scala is not Inheritable or invokable by app. My app use JavaSQLContext and registerTempTable. I want to expose these TempTables by HiveThriftServer2(JDBC Server). Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods
[ https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340096#comment-14340096 ] Littlestar edited comment on SPARK-6049 at 2/27/15 2:43 PM: very sorry, HiveThriftServer2.startWithContext can be called by java, SPARK-3675 my question is similar to SPARK-4865 was (Author: cnstar9988): very sorry, HiveThriftServer2.startWithContext can be called by java. I will open another JIRA issue for new problem. HiveThriftServer2 may expose Inheritable methods Key: SPARK-6049 URL: https://issues.apache.org/jira/browse/SPARK-6049 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Labels: HiveThriftServer2 Does HiveThriftServer2 may expose Inheritable methods? HiveThriftServer2 is very good when used as a JDBC Server, but HiveThriftServer2.scala is not Inheritable or invokable by app. My app use JavaSQLContext and registerTempTable. I want to expose these TempTables by HiveThriftServer2(JDBC Server). Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods
[ https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340018#comment-14340018 ] Littlestar commented on SPARK-6049: --- thanks. I call ok by static methods, but not tables finds. Does I need to open new JIRA issue? {noformat} SparkConf sparkConf = new SparkConf().setAppName(ParquetFileQuery).set(spark.executor.memory, 4g); JavaSparkContext ctx = new JavaSparkContext(sparkConf); JavaSQLContext sqlCtx = new JavaSQLContext(ctx); JavaSchemaRDD mids = sqlCtx.parquetFile(/parquetdir); mids.registerTempTable(demo); HiveThriftServer2.startWithContext(new HiveContext(ctx.sc())); {noformat} HiveThriftServer2 may expose Inheritable methods Key: SPARK-6049 URL: https://issues.apache.org/jira/browse/SPARK-6049 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Labels: HiveThriftServer2 Does HiveThriftServer2 may expose Inheritable methods? HiveThriftServer2 is very good when used as a JDBC Server, but HiveThriftServer2.scala is not Inheritable or invokable by app. My app use JavaSQLContext and registerTempTable. I want to expose these TempTables by HiveThriftServer2(JDBC Server). Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods
[ https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339980#comment-14339980 ] Littlestar commented on SPARK-6049: --- HiveThriftServer2.scala is can't call directly in other program becuase of privated constructor. It's main has lot of scala code can't rewrite by java. I request HiveThriftServer2.scala to be refactored to expose a more programmatic API. My app: main() { JavaSQLContext sqlContex; registerTempTable(table_1); registerTempTable(table_2); I want to startup HiveThriftServer2(JDBC Server) here expose table_1 and table_2 } HiveThriftServer2.scala#startWithContext has DeveloperApi annotation, but startWithContext is not invokable by app. /** * :: DeveloperApi :: * Starts a new thrift server with the given context. */ @DeveloperApi def startWithContext(sqlContext: HiveContext): Unit = { val server = new HiveThriftServer2(sqlContext) server.init(sqlContext.hiveconf) server.start() } thanks. HiveThriftServer2 may expose Inheritable methods Key: SPARK-6049 URL: https://issues.apache.org/jira/browse/SPARK-6049 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Labels: HiveThriftServer2 Does HiveThriftServer2 may expose Inheritable methods? HiveThriftServer2 is very good when used as a JDBC Server, but HiveThriftServer2.scala is not Inheritable or invokable by app. My app use JavaSQLContext and registerTempTable. I want to expose these TempTables by HiveThriftServer2(JDBC Server). Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods
[ https://issues.apache.org/jira/browse/SPARK-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339980#comment-14339980 ] Littlestar edited comment on SPARK-6049 at 2/27/15 10:33 AM: - HiveThriftServer2.scala is can't call directly in other program becuase of privated constructor. It's main has lot of scala code can't rewrite by java. I request HiveThriftServer2.scala to be refactored to expose a more programmatic API. {noformat} My app: main() { JavaSQLContext sqlContex; registerTempTable(table_1); registerTempTable(table_2); I want to startup HiveThriftServer2(JDBC Server) here expose table_1 and table_2 } {noformat} HiveThriftServer2.scala#startWithContext has DeveloperApi annotation, but startWithContext is not invokable by app. {noformat} /** * :: DeveloperApi :: * Starts a new thrift server with the given context. */ @DeveloperApi def startWithContext(sqlContext: HiveContext): Unit = { val server = new HiveThriftServer2(sqlContext) server.init(sqlContext.hiveconf) server.start() } {noformat} thanks. was (Author: cnstar9988): HiveThriftServer2.scala is can't call directly in other program becuase of privated constructor. It's main has lot of scala code can't rewrite by java. I request HiveThriftServer2.scala to be refactored to expose a more programmatic API. My app: main() { JavaSQLContext sqlContex; registerTempTable(table_1); registerTempTable(table_2); I want to startup HiveThriftServer2(JDBC Server) here expose table_1 and table_2 } HiveThriftServer2.scala#startWithContext has DeveloperApi annotation, but startWithContext is not invokable by app. /** * :: DeveloperApi :: * Starts a new thrift server with the given context. */ @DeveloperApi def startWithContext(sqlContext: HiveContext): Unit = { val server = new HiveThriftServer2(sqlContext) server.init(sqlContext.hiveconf) server.start() } thanks. HiveThriftServer2 may expose Inheritable methods Key: SPARK-6049 URL: https://issues.apache.org/jira/browse/SPARK-6049 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Labels: HiveThriftServer2 Does HiveThriftServer2 may expose Inheritable methods? HiveThriftServer2 is very good when used as a JDBC Server, but HiveThriftServer2.scala is not Inheritable or invokable by app. My app use JavaSQLContext and registerTempTable. I want to expose these TempTables by HiveThriftServer2(JDBC Server). Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6049) HiveThriftServer2 may expose Inheritable methods
Littlestar created SPARK-6049: - Summary: HiveThriftServer2 may expose Inheritable methods Key: SPARK-6049 URL: https://issues.apache.org/jira/browse/SPARK-6049 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Does HiveThriftServer2 may expose Inheritable methods? HiveThriftServer2 is very good when used as a JDBC Server, but HiveThriftServer2.scala is not Inheritable or invokable by app. My app use JavaSQLContext and registerTempTable. I want to expose these TempTables by HiveThriftServer2(JDBC Server). Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found
[ https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323529#comment-14323529 ] Littlestar commented on SPARK-5829: --- The following code as same as saveAsNewAPIHadoopFiles, run ok for me. Mark here. {noformat} import org.apache.spark.streaming.Time; import org.apache.spark.streaming.api.java.JavaPairDStream; ... final String outDir = /testspark/out; is.foreachRDD(new Function2JavaPairRDDInteger, Integer, Time, Void() { @Override public Void call(JavaPairRDDInteger, Integer t1, Time t2) throws Exception { if (t1.partitions().size() 0) { t1.saveAsNewAPIHadoopFile(outDir + t2.milliseconds() + tmp, Integer.class, Integer.class, TextOutputFormat.class); } return null; } }); {noformat} JavaStreamingContext.fileStream run task loop repeated empty when no more new files found -- Key: SPARK-5829 URL: https://issues.apache.org/jira/browse/SPARK-5829 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Environment: spark master (1.3.0) with SPARK-5826 patch. Reporter: Littlestar Priority: Minor spark master (1.3.0) with SPARK-5826 patch. JavaStreamingContext.fileStream run task repeated empty when no more new files reproduce: 1. mkdir /testspark/watchdir on HDFS. 2. run app. 3. put some text files into /testspark/watchdir. every 30 seconds, spark log indicates that a new sub task runs. and /testspark/resultdir/ has new directory with empty files every 30 seconds. when no new files add, but it runs new task with empy rdd. {noformat} package my.test.hadoop.spark; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.streaming.Durations; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import scala.Tuple2; public class TestStream { @SuppressWarnings({ serial, resource }) public static void main(String[] args) throws Exception { SparkConf conf = new SparkConf().setAppName(TestStream); JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(30)); jssc.checkpoint(/testspark/checkpointdir); Configuration jobConf = new Configuration(); jobConf.set(my.test.fields,fields); JavaPairDStreamInteger, Integer is = jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, TextInputFormat.class, new FunctionPath, Boolean() { @Override public Boolean call(Path v1) throws Exception { return true; } }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, Text, Integer, Integer() { @Override public Tuple2Integer, Integer call(Tuple2LongWritable, Text arg0) throws Exception { return new Tuple2Integer, Integer(1, 1); } }); JavaPairDStreamInteger, Integer rs = is.reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer arg0, Integer arg1) throws Exception { return arg0 + arg1; } }); rs.checkpoint(Durations.seconds(60)); rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, suffix, Integer.class, Integer.class, TextOutputFormat.class); jssc.start(); jssc.awaitTermination(); } } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found
[ https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323619#comment-14323619 ] Littlestar commented on SPARK-5829: --- But it depends on org.apache.spark.streaming.Time, not easy to call. Seems like this is a necessary feature of the current design and can be partially worked around by filtering in user space. SPARK-3292 JavaStreamingContext.fileStream run task loop repeated empty when no more new files found -- Key: SPARK-5829 URL: https://issues.apache.org/jira/browse/SPARK-5829 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Environment: spark master (1.3.0) with SPARK-5826 patch. Reporter: Littlestar Priority: Minor spark master (1.3.0) with SPARK-5826 patch. JavaStreamingContext.fileStream run task repeated empty when no more new files reproduce: 1. mkdir /testspark/watchdir on HDFS. 2. run app. 3. put some text files into /testspark/watchdir. every 30 seconds, spark log indicates that a new sub task runs. and /testspark/resultdir/ has new directory with empty files every 30 seconds. when no new files add, but it runs new task with empy rdd. {noformat} package my.test.hadoop.spark; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.streaming.Durations; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import scala.Tuple2; public class TestStream { @SuppressWarnings({ serial, resource }) public static void main(String[] args) throws Exception { SparkConf conf = new SparkConf().setAppName(TestStream); JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(30)); jssc.checkpoint(/testspark/checkpointdir); Configuration jobConf = new Configuration(); jobConf.set(my.test.fields,fields); JavaPairDStreamInteger, Integer is = jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, TextInputFormat.class, new FunctionPath, Boolean() { @Override public Boolean call(Path v1) throws Exception { return true; } }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, Text, Integer, Integer() { @Override public Tuple2Integer, Integer call(Tuple2LongWritable, Text arg0) throws Exception { return new Tuple2Integer, Integer(1, 1); } }); JavaPairDStreamInteger, Integer rs = is.reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer arg0, Integer arg1) throws Exception { return arg0 + arg1; } }); rs.checkpoint(Durations.seconds(60)); rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, suffix, Integer.class, Integer.class, TextOutputFormat.class); jssc.start(); jssc.awaitTermination(); } } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3638) Commons HTTP client dependency conflict in extras/kinesis-asl module
[ https://issues.apache.org/jira/browse/SPARK-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322559#comment-14322559 ] Littlestar commented on SPARK-3638: --- Oh, It was introduced in kinesis-asl profile only. I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. now I build spark with kinesis-asl profile, it's ok with httpclient 4.2.6, thanks. mvn dependency:tree Commons HTTP client dependency conflict in extras/kinesis-asl module Key: SPARK-3638 URL: https://issues.apache.org/jira/browse/SPARK-3638 Project: Spark Issue Type: Bug Components: Examples, Streaming Affects Versions: 1.1.0 Reporter: Aniket Bhatnagar Labels: dependencies Fix For: 1.1.1, 1.2.0 Followed instructions as mentioned @ https://github.com/apache/spark/blob/master/docs/streaming-kinesis-integration.md and when running the example, I get the following error: {code} Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) {code} I believe this is due to the dependency conflict as described @ http://mail-archives.apache.org/mod_mbox/spark-dev/201409.mbox/%3ccajob8btdxks-7-spjj5jmnw0xsnrjwdpcqqtjht1hun6j4z...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
[ https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-5834: -- Description: I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module requires httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. was: I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module required httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old -- Key: SPARK-5834 URL: https://issues.apache.org/jira/browse/SPARK-5834 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module requires httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
Littlestar created SPARK-5834: - Summary: spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old Key: SPARK-5834 URL: https://issues.apache.org/jira/browse/SPARK-5834 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.2.1 Reporter: Littlestar assembly-1.1.1-hadoop2.4.0.jar the class HttpPatch is not there which was introduced in 4.2 I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module required httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
[ https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-5834: -- Description: I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module required httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. was: assembly-1.1.1-hadoop2.4.0.jar the class HttpPatch is not there which was introduced in 4.2 I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module required httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. Priority: Minor (was: Major) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old -- Key: SPARK-5834 URL: https://issues.apache.org/jira/browse/SPARK-5834 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module required httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321862#comment-14321862 ] Littlestar commented on SPARK-5826: --- I put some txt files into /testspark/watchdir, It throws NullPointerException 15/02/15 16:18:20 WARN dstream.FileInputDStream: Error finding new files java.lang.NullPointerException at org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$fn$3$1.apply(JavaStreamingContext.scala:329) at org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$fn$3$1.apply(JavaStreamingContext.scala:329) at org.apache.spark.streaming.dstream.FileInputDStream.org$apache$spark$streaming$dstream$FileInputDStream$$isNewFile(FileInputDStream.scala:215) at org.apache.spark.streaming.dstream.FileInputDStream$$anon$3.accept(FileInputDStream.scala:172) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1489) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1523) at org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:174) at org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:132) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:300) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288) at scala.Option.orElse(Option.scala:257) at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:285) at org.apache.spark.streaming.dstream.MappedDStream.compute(MappedDStream.scala:35) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:300) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288) at scala.Option.orElse(Option.scala:257) at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:285) at org.apache.spark.streaming.dstream.ShuffledDStream.compute(ShuffledDStream.scala:41) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:301) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:300) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288) at scala.Option.orElse(Option.scala:257) at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:285) at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38) at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116) at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:232) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:230) at scala.util.Try$.apply(Try.scala:161) at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:230) at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:167) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78) at
[jira] [Created] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException
Littlestar created SPARK-5826: - Summary: JavaStreamingContext.fileStream cause Configuration NotSerializableException Key: SPARK-5826 URL: https://issues.apache.org/jira/browse/SPARK-5826 Project: Spark Issue Type: Bug Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String directory, ClassLongWritable kClass, ClassText vClass, ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean newFilesOnly, Configuration conf) I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration. but it throw strange exception. java.io.NotSerializableException: org.apache.hadoop.conf.Configuration at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440) at org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075) at org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184) at org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278) at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at
[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-5826: -- Attachment: TestStream.java JavaStreamingContext.fileStream cause Configuration NotSerializableException Key: SPARK-5826 URL: https://issues.apache.org/jira/browse/SPARK-5826 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Attachments: TestStream.java org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String directory, ClassLongWritable kClass, ClassText vClass, ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean newFilesOnly, Configuration conf) I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration. but it throw strange exception. java.io.NotSerializableException: org.apache.hadoop.conf.Configuration at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440) at org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075) at org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184) at org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278) at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at
[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-5826: -- Attachment: (was: TestStream.java) JavaStreamingContext.fileStream cause Configuration NotSerializableException Key: SPARK-5826 URL: https://issues.apache.org/jira/browse/SPARK-5826 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Attachments: TestStream.java org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String directory, ClassLongWritable kClass, ClassText vClass, ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean newFilesOnly, Configuration conf) I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration. but it throw strange exception. java.io.NotSerializableException: org.apache.hadoop.conf.Configuration at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440) at org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075) at org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184) at org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278) at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at
[jira] [Commented] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
[ https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321848#comment-14321848 ] Littlestar commented on SPARK-5795: --- works for me, thanks. api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java - Key: SPARK-5795 URL: https://issues.apache.org/jira/browse/SPARK-5795 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Littlestar Priority: Critical Attachments: TestStreamCompile.java import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; the following code can't compile on java. JavaPairDStreamInteger, Integer rs = rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, TextOutputFormat.class, jobConf); but similar code in JavaPairRDD works ok. JavaPairRDDString, String counts =... counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, TextOutputFormat.class, jobConf); mybe the def saveAsNewAPIHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } = def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]]( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321849#comment-14321849 ] Littlestar edited comment on SPARK-5826 at 2/15/15 7:51 AM: testcode upload. It throw Exception every 2 seconds. 15/02/15 15:50:35 ERROR actor.OneForOneStrategy: org.apache.hadoop.conf.Configuration java.io.NotSerializableException: org.apache.hadoop.conf.Configuration at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440) at org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075) at org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184) at org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278) at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) was (Author: cnstar9988): testcode upload. !TestStream.java! JavaStreamingContext.fileStream cause Configuration NotSerializableException Key: SPARK-5826 URL:
[jira] [Comment Edited] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
[ https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321848#comment-14321848 ] Littlestar edited comment on SPARK-5795 at 2/15/15 8:05 AM: I merge pull/4608 and rebuild, it works for me, thanks. was (Author: cnstar9988): works for me, thanks. api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java - Key: SPARK-5795 URL: https://issues.apache.org/jira/browse/SPARK-5795 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Littlestar Priority: Critical Attachments: TestStreamCompile.java import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; the following code can't compile on java. JavaPairDStreamInteger, Integer rs = rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, TextOutputFormat.class, jobConf); but similar code in JavaPairRDD works ok. JavaPairRDDString, String counts =... counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, TextOutputFormat.class, jobConf); mybe the def saveAsNewAPIHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } = def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]]( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-5826: -- Attachment: TestStream.java testcode upload. !TestStream.java! JavaStreamingContext.fileStream cause Configuration NotSerializableException Key: SPARK-5826 URL: https://issues.apache.org/jira/browse/SPARK-5826 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Attachments: TestStream.java org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String directory, ClassLongWritable kClass, ClassText vClass, ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean newFilesOnly, Configuration conf) I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration. but it throw strange exception. java.io.NotSerializableException: org.apache.hadoop.conf.Configuration at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440) at org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075) at org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184) at org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278) at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at
[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-5826: -- Component/s: Streaming JavaStreamingContext.fileStream cause Configuration NotSerializableException Key: SPARK-5826 URL: https://issues.apache.org/jira/browse/SPARK-5826 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String directory, ClassLongWritable kClass, ClassText vClass, ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean newFilesOnly, Configuration conf) I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration. but it throw strange exception. java.io.NotSerializableException: org.apache.hadoop.conf.Configuration at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440) at org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075) at org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184) at org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278) at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at
[jira] [Commented] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322317#comment-14322317 ] Littlestar commented on SPARK-5826: --- I merge pull/4612 and rebuild, it works for me, no exception, thanks. JavaStreamingContext.fileStream cause Configuration NotSerializableException Key: SPARK-5826 URL: https://issues.apache.org/jira/browse/SPARK-5826 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Littlestar Priority: Critical Attachments: TestStream.java org.apache.spark.streaming.api.java.JavaStreamingContext.fileStream(String directory, ClassLongWritable kClass, ClassText vClass, ClassTextInputFormat fClass, FunctionPath, Boolean filter, boolean newFilesOnly, Configuration conf) I use JavaStreamingContext.fileStream with 1.3.0/master with Configuration. but it throw strange exception. java.io.NotSerializableException: org.apache.hadoop.conf.Configuration at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:440) at org.apache.spark.streaming.DStreamGraph$$anonfun$writeObject$1.apply$mcV$sp(DStreamGraph.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1075) at org.apache.spark.streaming.DStreamGraph.writeObject(DStreamGraph.scala:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:184) at org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:278) at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:169) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:78) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1.aroundReceive(JobGenerator.scala:76) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at
[jira] [Created] (SPARK-5829) JavaStreamingContext.fileStream run task repeated empty when no more new files
Littlestar created SPARK-5829: - Summary: JavaStreamingContext.fileStream run task repeated empty when no more new files Key: SPARK-5829 URL: https://issues.apache.org/jira/browse/SPARK-5829 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Environment: spark master (1.3.0) with SPARK-5826 patch. Reporter: Littlestar spark master (1.3.0) with SPARK-5826 patch. JavaStreamingContext.fileStream run task repeated empty when no more new files reproduce: 1. mkdir /testspark/watchdir on HDFS. 2. run app. 3. put some text files into /testspark/watchdir. every 30 seconds, spark log indicates that a new sub task runs. and /testspark/resultdir/ has new directory with empty files every 30 seconds. when no new files add, but it runs new task with empy rdd. {noformat} package my.test.hadoop.spark; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.streaming.Durations; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import scala.Tuple2; public class TestStream { @SuppressWarnings({ serial, resource }) public static void main(String[] args) throws Exception { SparkConf conf = new SparkConf().setAppName(TestStream); JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(30)); jssc.checkpoint(/testspark/checkpointdir); Configuration jobConf = new Configuration(); jobConf.set(my.test.fields,fields); JavaPairDStreamInteger, Integer is = jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, TextInputFormat.class, new FunctionPath, Boolean() { @Override public Boolean call(Path v1) throws Exception { return true; } }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, Text, Integer, Integer() { @Override public Tuple2Integer, Integer call(Tuple2LongWritable, Text arg0) throws Exception { return new Tuple2Integer, Integer(1, 1); } }); JavaPairDStreamInteger, Integer rs = is.reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer arg0, Integer arg1) throws Exception { return arg0 + arg1; } }); rs.checkpoint(Durations.seconds(60)); rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, suffix, Integer.class, Integer.class, TextOutputFormat.class); jssc.start(); jssc.awaitTermination(); } } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5829) JavaStreamingContext.fileStream run task repeated empty when no more new files
[ https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322321#comment-14322321 ] Littlestar commented on SPARK-5829: --- when I add new files into /testspark/watchdir, it runs new task with good output. when no new files add, it runs new task with empy rdd every 30 seconds.(I think there is some bugs, when no new files found) JavaStreamingContext.fileStream run task repeated empty when no more new files -- Key: SPARK-5829 URL: https://issues.apache.org/jira/browse/SPARK-5829 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Environment: spark master (1.3.0) with SPARK-5826 patch. Reporter: Littlestar spark master (1.3.0) with SPARK-5826 patch. JavaStreamingContext.fileStream run task repeated empty when no more new files reproduce: 1. mkdir /testspark/watchdir on HDFS. 2. run app. 3. put some text files into /testspark/watchdir. every 30 seconds, spark log indicates that a new sub task runs. and /testspark/resultdir/ has new directory with empty files every 30 seconds. when no new files add, but it runs new task with empy rdd. {noformat} package my.test.hadoop.spark; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.streaming.Durations; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import scala.Tuple2; public class TestStream { @SuppressWarnings({ serial, resource }) public static void main(String[] args) throws Exception { SparkConf conf = new SparkConf().setAppName(TestStream); JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(30)); jssc.checkpoint(/testspark/checkpointdir); Configuration jobConf = new Configuration(); jobConf.set(my.test.fields,fields); JavaPairDStreamInteger, Integer is = jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, TextInputFormat.class, new FunctionPath, Boolean() { @Override public Boolean call(Path v1) throws Exception { return true; } }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, Text, Integer, Integer() { @Override public Tuple2Integer, Integer call(Tuple2LongWritable, Text arg0) throws Exception { return new Tuple2Integer, Integer(1, 1); } }); JavaPairDStreamInteger, Integer rs = is.reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer arg0, Integer arg1) throws Exception { return arg0 + arg1; } }); rs.checkpoint(Durations.seconds(60)); rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, suffix, Integer.class, Integer.class, TextOutputFormat.class); jssc.start(); jssc.awaitTermination(); } } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found
[ https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-5829: -- Summary: JavaStreamingContext.fileStream run task loop repeated empty when no more new files found (was: JavaStreamingContext.fileStream run task repeated empty when no more new files) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found -- Key: SPARK-5829 URL: https://issues.apache.org/jira/browse/SPARK-5829 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Environment: spark master (1.3.0) with SPARK-5826 patch. Reporter: Littlestar spark master (1.3.0) with SPARK-5826 patch. JavaStreamingContext.fileStream run task repeated empty when no more new files reproduce: 1. mkdir /testspark/watchdir on HDFS. 2. run app. 3. put some text files into /testspark/watchdir. every 30 seconds, spark log indicates that a new sub task runs. and /testspark/resultdir/ has new directory with empty files every 30 seconds. when no new files add, but it runs new task with empy rdd. {noformat} package my.test.hadoop.spark; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.streaming.Durations; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import scala.Tuple2; public class TestStream { @SuppressWarnings({ serial, resource }) public static void main(String[] args) throws Exception { SparkConf conf = new SparkConf().setAppName(TestStream); JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(30)); jssc.checkpoint(/testspark/checkpointdir); Configuration jobConf = new Configuration(); jobConf.set(my.test.fields,fields); JavaPairDStreamInteger, Integer is = jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, TextInputFormat.class, new FunctionPath, Boolean() { @Override public Boolean call(Path v1) throws Exception { return true; } }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, Text, Integer, Integer() { @Override public Tuple2Integer, Integer call(Tuple2LongWritable, Text arg0) throws Exception { return new Tuple2Integer, Integer(1, 1); } }); JavaPairDStreamInteger, Integer rs = is.reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer arg0, Integer arg1) throws Exception { return arg0 + arg1; } }); rs.checkpoint(Durations.seconds(60)); rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, suffix, Integer.class, Integer.class, TextOutputFormat.class); jssc.start(); jssc.awaitTermination(); } } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found
[ https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-5829: -- Priority: Minor (was: Major) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found -- Key: SPARK-5829 URL: https://issues.apache.org/jira/browse/SPARK-5829 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Environment: spark master (1.3.0) with SPARK-5826 patch. Reporter: Littlestar Priority: Minor spark master (1.3.0) with SPARK-5826 patch. JavaStreamingContext.fileStream run task repeated empty when no more new files reproduce: 1. mkdir /testspark/watchdir on HDFS. 2. run app. 3. put some text files into /testspark/watchdir. every 30 seconds, spark log indicates that a new sub task runs. and /testspark/resultdir/ has new directory with empty files every 30 seconds. when no new files add, but it runs new task with empy rdd. {noformat} package my.test.hadoop.spark; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.streaming.Durations; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import scala.Tuple2; public class TestStream { @SuppressWarnings({ serial, resource }) public static void main(String[] args) throws Exception { SparkConf conf = new SparkConf().setAppName(TestStream); JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(30)); jssc.checkpoint(/testspark/checkpointdir); Configuration jobConf = new Configuration(); jobConf.set(my.test.fields,fields); JavaPairDStreamInteger, Integer is = jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, TextInputFormat.class, new FunctionPath, Boolean() { @Override public Boolean call(Path v1) throws Exception { return true; } }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, Text, Integer, Integer() { @Override public Tuple2Integer, Integer call(Tuple2LongWritable, Text arg0) throws Exception { return new Tuple2Integer, Integer(1, 1); } }); JavaPairDStreamInteger, Integer rs = is.reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer arg0, Integer arg1) throws Exception { return arg0 + arg1; } }); rs.checkpoint(Durations.seconds(60)); rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, suffix, Integer.class, Integer.class, TextOutputFormat.class); jssc.start(); jssc.awaitTermination(); } } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found
[ https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322408#comment-14322408 ] Littlestar edited comment on SPARK-5829 at 2/16/15 6:05 AM: FileInputDStream.scala {noformat} override def compute(validTime: Time): Option[RDD[(K, V)]] = { // Find new files val newFiles = findNewFiles(validTime.milliseconds) logInfo(New files at time + validTime + :\n + newFiles.mkString(\n)) batchTimeToSelectedFiles += ((validTime, newFiles)) recentlySelectedFiles ++= newFiles +may there is check {newFiles.size 0} can avoid this problem??+ Some(filesToRDD(newFiles)) } {noformat} Thanks. was (Author: cnstar9988): FileInputDStream.scala override def compute(validTime: Time): Option[RDD[(K, V)]] = { // Find new files val newFiles = findNewFiles(validTime.milliseconds) logInfo(New files at time + validTime + :\n + newFiles.mkString(\n)) batchTimeToSelectedFiles += ((validTime, newFiles)) recentlySelectedFiles ++= newFiles +may there is check {newFiles.size 0} can avoid this problem??+ Some(filesToRDD(newFiles)) } Thanks. JavaStreamingContext.fileStream run task loop repeated empty when no more new files found -- Key: SPARK-5829 URL: https://issues.apache.org/jira/browse/SPARK-5829 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Environment: spark master (1.3.0) with SPARK-5826 patch. Reporter: Littlestar Priority: Minor spark master (1.3.0) with SPARK-5826 patch. JavaStreamingContext.fileStream run task repeated empty when no more new files reproduce: 1. mkdir /testspark/watchdir on HDFS. 2. run app. 3. put some text files into /testspark/watchdir. every 30 seconds, spark log indicates that a new sub task runs. and /testspark/resultdir/ has new directory with empty files every 30 seconds. when no new files add, but it runs new task with empy rdd. {noformat} package my.test.hadoop.spark; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.streaming.Durations; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import scala.Tuple2; public class TestStream { @SuppressWarnings({ serial, resource }) public static void main(String[] args) throws Exception { SparkConf conf = new SparkConf().setAppName(TestStream); JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(30)); jssc.checkpoint(/testspark/checkpointdir); Configuration jobConf = new Configuration(); jobConf.set(my.test.fields,fields); JavaPairDStreamInteger, Integer is = jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, TextInputFormat.class, new FunctionPath, Boolean() { @Override public Boolean call(Path v1) throws Exception { return true; } }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, Text, Integer, Integer() { @Override public Tuple2Integer, Integer call(Tuple2LongWritable, Text arg0) throws Exception { return new Tuple2Integer, Integer(1, 1); } }); JavaPairDStreamInteger, Integer rs = is.reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer arg0, Integer arg1) throws Exception { return arg0 + arg1; } }); rs.checkpoint(Durations.seconds(60)); rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, suffix, Integer.class, Integer.class, TextOutputFormat.class); jssc.start(); jssc.awaitTermination(); } } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
Littlestar created SPARK-5795: - Summary: api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java Key: SPARK-5795 URL: https://issues.apache.org/jira/browse/SPARK-5795 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Littlestar import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; the following code can't compile on java. JavaPairDStreamInteger, Integer rs = rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, TextOutputFormat.class, jobConf); but similar code in JavaPairRDD works ok. JavaPairRDDString, String counts =... counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, TextOutputFormat.class, jobConf); mybe the def saveAsNewAPIHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } = def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]]( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
[ https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319769#comment-14319769 ] Littlestar commented on SPARK-5795: --- org.apache.spark.api.java.JavaPairRDDK, V {noformat} /** Output the RDD to any Hadoop-supported file system. */ def saveAsHadoopFile[F : OutputFormat[_, _]]( path: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F], conf: JobConf) { rdd.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass, conf) } /** Output the RDD to any Hadoop-supported file system. */ def saveAsHadoopFile[F : OutputFormat[_, _]]( path: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F]) { rdd.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass) } /** Output the RDD to any Hadoop-supported file system, compressing with the supplied codec. */ def saveAsHadoopFile[F : OutputFormat[_, _]]( path: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F], codec: Class[_ : CompressionCodec]) { rdd.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass, codec) } /** Output the RDD to any Hadoop-supported file system. */ def saveAsNewAPIHadoopFile[F : NewOutputFormat[_, _]]( path: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F], conf: Configuration) { rdd.saveAsNewAPIHadoopFile(path, keyClass, valueClass, outputFormatClass, conf) } /** * Output the RDD to any Hadoop-supported storage system, using * a Configuration object for that storage system. */ def saveAsNewAPIHadoopDataset(conf: Configuration) { rdd.saveAsNewAPIHadoopDataset(conf) } /** Output the RDD to any Hadoop-supported file system. */ def saveAsNewAPIHadoopFile[F : NewOutputFormat[_, _]]( path: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F]) { rdd.saveAsNewAPIHadoopFile(path, keyClass, valueClass, outputFormatClass) } {noformat} org.apache.spark.streaming.api.java.JavaPairDStreamK, V {noformat} /** * Save each RDD in `this` DStream as a Hadoop file. The file name at each batch interval is * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix. */ def saveAsHadoopFiles[F : OutputFormat[K, V]](prefix: String, suffix: String) { dstream.saveAsHadoopFiles(prefix, suffix) } /** * Save each RDD in `this` DStream as a Hadoop file. The file name at each batch interval is * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix. */ def saveAsHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : OutputFormat[_, _]]) { dstream.saveAsHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass) } /** * Save each RDD in `this` DStream as a Hadoop file. The file name at each batch interval is * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix. */ def saveAsHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : OutputFormat[_, _]], conf: JobConf) { dstream.saveAsHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } /** * Save each RDD in `this` DStream as a Hadoop file. The file name at each batch interval is * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix. */ def saveAsNewAPIHadoopFiles[F : NewOutputFormat[K, V]](prefix: String, suffix: String) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix) } /** * Save each RDD in `this` DStream as a Hadoop file. The file name at each batch interval is * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix. */ def saveAsNewAPIHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]]) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass) } /** * Save each RDD in `this` DStream as a Hadoop file. The file name at each batch interval is * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix. */ def saveAsNewAPIHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } {noformat} api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
[jira] [Updated] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
[ https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-5795: -- Attachment: TestStreamCompile.java my testcase on java 1.7 and spark 1.3 trunk. Thanks. api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java - Key: SPARK-5795 URL: https://issues.apache.org/jira/browse/SPARK-5795 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Attachments: TestStreamCompile.java import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; the following code can't compile on java. JavaPairDStreamInteger, Integer rs = rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, TextOutputFormat.class, jobConf); but similar code in JavaPairRDD works ok. JavaPairRDDString, String counts =... counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, TextOutputFormat.class, jobConf); mybe the def saveAsNewAPIHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } = def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]]( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
[ https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320109#comment-14320109 ] Littlestar commented on SPARK-5795: --- Does it same problem as SPARK-5297, thanks. api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java - Key: SPARK-5795 URL: https://issues.apache.org/jira/browse/SPARK-5795 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor Attachments: TestStreamCompile.java import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; the following code can't compile on java. JavaPairDStreamInteger, Integer rs = rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, TextOutputFormat.class, jobConf); but similar code in JavaPairRDD works ok. JavaPairRDDString, String counts =... counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, TextOutputFormat.class, jobConf); mybe the def saveAsNewAPIHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } = def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]]( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java
[ https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320083#comment-14320083 ] Littlestar commented on SPARK-5795: --- error info... The method saveAsNewAPIHadoopFiles(String, String, Class?, Class?, Class? extends OutputFormat?,?) in the type JavaPairDStreamInteger,Integer is not applicable for the arguments (String, String, ClassInteger, ClassInteger, ClassTextOutputFormat) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java - Key: SPARK-5795 URL: https://issues.apache.org/jira/browse/SPARK-5795 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; the following code can't compile on java. JavaPairDStreamInteger, Integer rs = rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, TextOutputFormat.class, jobConf); but similar code in JavaPairRDD works ok. JavaPairRDDString, String counts =... counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, TextOutputFormat.class, jobConf); mybe the def saveAsNewAPIHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } = def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]]( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F], conf: Configuration = new Configuration) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org