Why does Apache Spark Master shutdown when Zookeeper expires the session

2019-03-05 Thread lokeshkumar
As I understand, Apache Spark Master can be run in high availability mode
using Zookeeper. That is, multiple Spark masters can run in Leader/Follower
mode and these modes are registered with Zookeeper.

In our scenario Zookeeper is expiring the Spark Master's session which is
acting as Leader. So the Spark MAster which is leader receives this
notification and shutsdown deliberately.

Can someone explain why this decision os shutting down rather than retrying
has been taken?

And why does Kafka retry connecting to Zookeeper when it receives the same
Expiry notification?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark 2.4.0 Master going down

2019-02-28 Thread lokeshkumar
Hi Akshay

Thanks for the response please find below the answers to your questions.

1. We are running Spark in cluster mode the cluster manager being Spark's
standalone cluster manager.
2. All the ports are open and we preconfigure on what ports the
communication should happen and modify firewall rules to allow traffic on
these ports. (The functionality is fine till Spark master goes down after 60
mins)
3. Memory consumptions of all the components:

Spark Master:
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00   0.00  12.91  35.11  97.08  95.80  50.239 20.197   
0.436
Spark Worker:
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
 51.64   0.00  46.66  27.44  97.57  95.85 100.381 20.233   
0.613
Spark Submit Process (Driver):
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00  63.57  93.82  26.29  98.24  97.53   4663  124.648   109   20.910 
145.558
Spark executor (Coarse grained):
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00  69.77  17.74  31.13  95.67  90.44   7353  556.888 51.572 
558.460



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark 2.4.0 Master going down

2019-02-28 Thread lokeshkumar
Hi Akshay

Thanks for the response please find below the answers to your questions.

1. We are running Spark in cluster mode the cluster manager being Spark's
standalone cluster manager.
2. All the ports are open and we preconfigure on what ports the
communication should happen and modify firewall rules to allow traffic on
these ports. (The functionality is fine till Spark master goes down after 60
mins)
3. Memory consumptions of all the components:

Spark Master:
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00   0.00  12.91  35.11  97.08  95.80  50.239 20.197   
0.436
Spark Worker:
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
 51.64   0.00  46.66  27.44  97.57  95.85 100.381 20.233   
0.613
Spark Submit Process (Driver):
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00  63.57  93.82  26.29  98.24  97.53   4663  124.648   109   20.910 
145.558
Spark executor (Coarse grained):
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00  69.77  17.74  31.13  95.67  90.44   7353  556.888 51.572 
558.460



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark 2.4.0 Master going down

2019-02-27 Thread lokeshkumar
Hi All

We are running Spark version 2.4.0 and we run few Spark streaming jobs
listening on Kafka topics. We receive an average of 10-20 msgs per second. 
And the Spark master has been going down after 1-2 hours of it running.
Exception is given below:
Along with that spark executors also get killed.

This was not happening with Spark 2.1.1 it started happening with Spark
2.4.0 any help/suggestion is appreciated.

The exception that we see is 

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any
reply from 192.168.43.167:40007 in 120 seconds. This timeout is controlled
by spark.rpc.askTimeout
at
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
at scala.util.Try$.apply(Try.scala:192)
at scala.util.Failure.recover(Try.scala:216)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at
org.spark_project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at
scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
at
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at 
scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
at
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
at
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at 
scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
at
scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:157)
at
org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:206)
at
org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:243)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply
from 192.168.43.167:40007 in 120 seconds



--
Sent from: 

Spark streaming rawSocketStream with protobuf

2016-04-02 Thread lokeshkumar
I am trying the spark streaming and listening to a socket, I am using the
rawSocketStream method to create a receiver and a DStream. But when I print
the DStream I get the below exception.*Code to create a
DStream:*JavaSparkContext jsc = new JavaSparkContext("Master",
"app");JavaStreamingContext jssc = new JavaStreamingContext(jsc, new
Seconds(3));JavaReceiverInputDStream rawStream =
jssc.rawSocketStream("localhost", );log.info(tracePrefix + "Created the
stream ...");rawStream.print();jssc.start();jssc.awaitTermination();*Code to
send a protobug object over TCP connection:*FileInputStream input = new
FileInputStream("address_book");AddressBook book =
AddressBookProtos.AddressBook.parseFrom(input);log.info(tracePrefix + "Size
of contacts: " + book.getPersonList().size());ServerSocket serverSocket =
new ServerSocket();log.info(tracePrefix + "Waiting for connections
...");Socket s1 = serverSocket.accept();log.info(tracePrefix + "Accepted a
connection ...");while(true) {Thread.sleep(3000);ObjectOutputStream
out = new ObjectOutputStream(s1.getOutputStream());   
out.writeByte(book.getSerializedSize());out.write(book.toByteArray());   
out.flush();log.info(tracePrefix + "Written to new socket");}*Stacktrace
is shown below:*java.lang.IllegalArgumentExceptionat
java.nio.ByteBuffer.allocate(ByteBuffer.java:334)at
org.apache.spark.streaming.dstream.RawNetworkReceiver.onStart(RawInputDStream.scala:88)
   
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
   
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
   
at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
   
at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
   
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)   
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)   
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)at
org.apache.spark.scheduler.Task.run(Task.scala:89)at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
  
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
  
at java.lang.Thread.run(Thread.java:745)2016-04-02 07:45:47,607 ERROR
[Executor task launch worker-0]
org.apache.spark.streaming.receiver.ReceiverSupervisorImplStopped receiver
with error: java.lang.IllegalArgumentException2016-04-02 07:45:47,613 ERROR
[Executor task launch worker-0] org.apache.spark.executor.ExecutorException
in task 0.0 in stage 0.0 (TID 0)java.lang.IllegalArgumentExceptionat
java.nio.ByteBuffer.allocate(ByteBuffer.java:334)at
org.apache.spark.streaming.dstream.RawNetworkReceiver.onStart(RawInputDStream.scala:88)
   
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
   
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
   
at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
   
at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
   
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)   
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)   
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)at
org.apache.spark.scheduler.Task.run(Task.scala:89)at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
  
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
  
at java.lang.Thread.run(Thread.java:745)2016-04-02 07:45:47,646 ERROR
[task-result-getter-0] org.apache.spark.scheduler.TaskSetManagerTask 0 in
stage 0.0 failed 1 times; aborting job2016-04-02 07:45:47,656 ERROR
[submit-job-thread-pool-0]
org.apache.spark.streaming.scheduler.ReceiverTrackerReceiver has been
stopped. Try to restart it.org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure:
Lost task 0.0 in stage 0.0 (TID 0, localhost):
java.lang.IllegalArgumentExceptionat
java.nio.ByteBuffer.allocate(ByteBuffer.java:334)at
org.apache.spark.streaming.dstream.RawNetworkReceiver.onStart(RawInputDStream.scala:88)
   
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
   
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
   
at

Twitter receiver not running in spark 1.6.0

2016-03-27 Thread lokeshkumar
Hi forum

For some reason if I include a twitter receiver and start the streaming
context, I get the below exception not sure why
Can someone let me know if anyone has already encountered this issue or am I
doing something wrong?

java.lang.ArithmeticException: / by zero
at org.apache.spark.streaming.Duration.isMultipleOf(Duration.scala:59)
at
org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:324)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
at scala.Option.orElse(Option.scala:257)
at
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:341)
at
org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:47)
at
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:115)
at
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:114)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at
org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:114)
at
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)
at
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246)
at scala.util.Try$.apply(Try.scala:161)
at
org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246)
at
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:181)
at
org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
at
org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
2016-03-28 10:08:00,112 ERROR [DefaultQuartzScheduler_Worker-8]
org.quartz.core.JobRunShell
Job sample_TwitterListener.SocialMedia_sample threw an unhandled Exception: 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Twitter-receiver-not-running-in-spark-1-6-0-tp26612.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to contribute by picking up starter bugs

2015-12-27 Thread lokeshkumar
Thanks a lot Jim, looking to forward to pick up some bugs.

On Mon, Dec 28, 2015 at 8:42 AM, jiml [via Apache Spark User List] <
ml-node+s1001560n25813...@n3.nabble.com> wrote:

> You probably want to start on the dev list:
> http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> I have seen bugs where someone does just what you suggest, great volunteer
> spirit! I think you have the right idea, just jump in there, create the
> pull request and ask them to assign the bug to you :)
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-contribute-by-picking-up-starter-bugs-tp25795p25813.html
> To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-contribute-by-picking-up-starter-bugs-tp25795p25815.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

How to contribute by picking up starter bugs

2015-12-24 Thread lokeshkumar
Hi 

>From the how to contribute page of spark jira project I came to know that I
can start by picking up the starter label bugs.
But who will assign me these bugs? Or should I just fix them and create a
pull request. 

Will be glad to help the project.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-contribute-by-picking-up-starter-bugs-tp25795.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark 1.4.0 org.apache.spark.sql.AnalysisException: cannot resolve 'probability' given input columns

2015-07-16 Thread lokeshkumar
Hi forumI am currently using Spark 1.4.0, and started using the ML pipeline
framework.I ran the example program
ml.JavaSimpleTextClassificationPipeline which uses the LogisticRegression.
But I wanted to do multiclass classification, so I used
DecisionTreeClassifier present in the org.apache.spark.ml.classification
package.The model got trained properly using the fit method, but when
testing the model using the print statement from above example, I am getting
following error that 'probability' column is not present.Is this column
present only for LogisticRegression? If so can I see what are the possible
columns present after DecisionTreeClassifier predicts the output? Also, one
morething how can I convert the predicted output back to String format if I
am using StringIndexer.*org.apache.spark.sql.AnalysisException: cannot
resolve 'probability' given input columns id, prediction, labelStr, data,
features, words, label;*at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:63)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:52)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285)
at
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:108)
at
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:123)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)   at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)   at
scala.collection.AbstractTraversable.map(Traversable.scala:105) at
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:122)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)  at
scala.collection.Iterator$class.foreach(Iterator.scala:727) at
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)  at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)   
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)at
scala.collection.AbstractIterator.to(Iterator.scala:1157)   at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)  at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)   
at
scala.collection.AbstractIterator.toArray(Iterator.scala:1157)  at
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:127)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:52)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42)
at
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:920)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:131) at
org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
at org.apache.spark.sql.DataFrame.select(DataFrame.scala:595)   at
org.apache.spark.sql.DataFrame.select(DataFrame.scala:611)  at
org.apache.spark.sql.DataFrame.select(DataFrame.scala:611)  at
com.xxx.ml.xxx.execute(xxx.java:129)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-org-apache-spark-sql-AnalysisException-cannot-resolve-probability-given-input-columns-tp23874.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark 1.4.0 compute-classpath.sh

2015-07-15 Thread lokeshkumar
Hi forum

I have downloaded the latest spark version 1.4.0 and started using it.
But I couldn't find the compute-classpath.sh file in bin/ which I am using
in previous versions to provide third party libraries to my application. 

Can anyone please let me know where I can provide CLASSPATH with my third
party libs in 1.4.0?

Thanks
Lokesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-compute-classpath-sh-tp23860.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark streaming - textFileStream/fileStream - Get file name

2015-04-28 Thread lokeshkumar
Hi Forum,

Using spark streaming and listening to the files in HDFS using
textFileStream/fileStream methods, how do we get the fileNames which are
read by these methods?

I used textFileStream which has file contents in JavaDStream and I got no
success with fileStream as it is throwing me a compilation error with spark
version 1.3.1.

Can someone please tell me if we have an API function or any other way to
get the file names that these streaming methods read?

Thanks
Lokesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark 1.3.1 JavaStreamingContext - fileStream compile error

2015-04-28 Thread lokeshkumar

Hi Forum 

I am facing below compile error when using the fileStream method of the
JavaStreamingContext class. 
I have copied the code from JavaAPISuite.java test class of spark test code. 

The error message is 


 
The method fileStream(String, ClassK, ClassV, ClassF,
FunctionPath,Boolean, boolean) in the type JavaStreamingContext is not
applicable for the arguments (String, ClassLongWritable, ClassText,
ClassTextInputFormat, new FunctionPath,Boolean(){}, boolean) 

 

Please help me to find a solution for this. 

http://apache-spark-user-list.1001560.n3.nabble.com/file/n22683/47.png 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-1-JavaStreamingContext-fileStream-compile-error-tp22683.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark 1.3.1 JavaStreamingContext - fileStream compile error

2015-04-27 Thread lokeshkumar
Hi Forum

I am facing below compile error when using the fileStream method of the
JavaStreamingContext class.
I have copied the code from JavaAPISuite.java test class of spark test code.

Please help me to find a solution for this.

http://apache-spark-user-list.1001560.n3.nabble.com/file/n22679/47.png 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-1-JavaStreamingContext-fileStream-compile-error-tp22679.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: java.io.InvalidClassException: org.apache.spark.api.java.JavaUtils$SerializableMapWrapper; no valid constructor

2014-12-01 Thread lokeshkumar
The workaround was to wrap the map returned by spark libraries into HashMap
and then broadcast them.
Could anyone please let me know if there is any issue open? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-InvalidClassException-org-apache-spark-api-java-JavaUtils-SerializableMapWrapper-no-valid-cor-tp20034p20070.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



java.io.InvalidClassException: org.apache.spark.api.java.JavaUtils$SerializableMapWrapper; no valid constructor

2014-11-29 Thread lokeshkumar
Hi forum, 

We have been using spark 1.1.0 and due to some bugs in it, we upgraded to
latest 1.3.0 from the master branch. 
And we are getting the below error while using the broadcast variable. 

Could anyone please point out whats wrong here? 

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 815081.0 failed 4 times, most recent failure: Lost task 0.3 in stage
815081.0 (TID 4751, ns2.x.net): java.io.InvalidClassException:
org.apache.spark.api.java.JavaUtils$SerializableMapWrapper; no valid
constructor 
at
java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:150)
 
at
java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:768) 
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775) 
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) 
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
at
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
 
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
 
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1000) 
at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
 
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
 
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) 
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) 
at
com.xxx.common.chores.kafka.kafkaListenerChore$4$1.call(kafkaListenerChore.java:418)
 
at
com.xxx.common.chores.kafka.kafkaListenerChore$4$1.call(kafkaListenerChore.java:406)
 
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
 
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
 
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:775) 
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:775) 
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) 
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) 
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) 
at org.apache.spark.scheduler.Task.run(Task.scala:56) 
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) 
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745) 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-InvalidClassException-org-apache-spark-api-java-JavaUtils-SerializableMapWrapper-no-valid-cor-tp20034.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



java.io.InvalidClassException: org.apache.spark.api.java.JavaUtils$SerializableMapWrapper; no valid constructor

2014-11-29 Thread lokeshkumar
Hi forum,

We have been using spark 1.1.0 and due to some bugs in it, we upgraded to
latest 1.3.0 from the master branch.
And we are getting the below error while using the broadcast variable.

Could anyone please point out whats wrong here?

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 815081.0 failed 4 times, most recent failure: Lost task 0.3 in stage
815081.0 (TID 4751, ns2.dataken.net): java.io.InvalidClassException:
org.apache.spark.api.java.JavaUtils$SerializableMapWrapper; no valid
constructor
at
java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:150)
at 
java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:768)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1000)
at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at
com.xxx.common.chores.kafka.kafkaListenerChore$4$1.call(kafkaListenerChore.java:418)
at
com.xxx.common.chores.kafka.kafkaListenerChore$4$1.call(kafkaListenerChore.java:406)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:775)
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:775)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-InvalidClassException-org-apache-spark-api-java-JavaUtils-SerializableMapWrapper-no-valid-cor-tp20033.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Multiple SparkContexts in same Driver JVM

2014-11-29 Thread lokeshkumar
Hi Forum,

Is it not possible to run multiple SparkContexts concurrently without
stopping the other one in the spark 1.3.0.
I have been trying this out and getting the below error.

Caused by: org.apache.spark.SparkException: Only one SparkContext may be
running in this JVM (see SPARK-2243). To ignore this error, set
spark.driver.allowMultipleContexts = true. The currently running
SparkContext was created at:

According to this, its not possible to create unless we specify the option
spark.driver.allowMultipleContexts = true.

So is there a way to create multiple concurrently running SparkContext in
same JVM or should we trigger Driver processes in different JVMs to do the
same?

Also please let me know where the option
'spark.driver.allowMultipleContexts' to be set? I have set it in
spark-env.sh SPARK_MASTER_OPTS but no luck.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-SparkContexts-in-same-Driver-JVM-tp20037.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Issue with Spark latest 1.2.0 build - ClassCastException from [B to SerializableWritable

2014-11-26 Thread lokeshkumar
The above issue happens while trying to do the below activity on JavaRDD
(calling take() on rdd)

JavaRDDString loadedRDD = sc.textFile(...);
String[] tokens = loadedRDD.take(1).get(0).split(,);





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-with-Spark-latest-1-2-0-build-ClassCastException-from-B-to-SerializableWritable-tp19824p19859.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Issue with Spark latest 1.2.0 build - ClassCastException from [B to SerializableWritable

2014-11-26 Thread lokeshkumar
Hi Sean

Thanks for reply,
We upgraded our spark cluster from 1.1.0 to 1.2.0.
And we also thought that this issue might be due to mis matching spark jar
versions.
But we double checked and re installed our app completely in a new system
with spark-1.2.0 distro, but still no result.
Facing the same problem.

This does not happen when master is set to 'local[*]'.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-with-Spark-latest-1-2-0-build-ClassCastException-from-B-to-SerializableWritable-tp19824p19864.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Issue with Spark latest 1.2.0 build - ClassCastException from [B to SerializableWritable

2014-11-25 Thread lokeshkumar
Hello forum,

We are using spark distro built from the source of latest 1.2.0 tag.
And we are facing the below issue, while trying to act upon the JavaRDD
instance, the stacktrace is given below.
Can anyone please let me know, what can be wrong here?

java.lang.ClassCastException: [B cannot be cast to
org.apache.spark.SerializableWritable
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:138)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.RDD.take(RDD.scala:1060)
at 
org.apache.spark.api.java.JavaRDDLike$class.take(JavaRDDLike.scala:419)
at org.apache.spark.api.java.JavaRDD.take(JavaRDD.scala:32)
at
com.dataken.common.chores.InformationDataLoadChore.run(InformationDataLoadChore.java:69)
at com.dataken.common.pipeline.DatakenTask.start(DatakenTask.java:110)
at
com.dataken.tasks.objectcentricprocessor.ObjectCentricProcessTask.execute(ObjectCentricProcessTask.java:99)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
2014-11-26 08:07:38,454 ERROR [DefaultQuartzScheduler_Worker-2]
org.quartz.core.ErrorLogger
Job (report_report.report_report threw an exception.

org.quartz.SchedulerException: Job threw an unhandled exception. [See nested
exception: java.lang.ClassCastException: [B cannot be cast to
org.apache.spark.SerializableWritable]
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.lang.ClassCastException: [B cannot be cast to
org.apache.spark.SerializableWritable
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:138)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.RDD.take(RDD.scala:1060)
at 
org.apache.spark.api.java.JavaRDDLike$class.take(JavaRDDLike.scala:419)
at org.apache.spark.api.java.JavaRDD.take(JavaRDD.scala:32)
at
com.dataken.common.chores.InformationDataLoadChore.run(InformationDataLoadChore.java:69)
at com.dataken.common.pipeline.DatakenTask.start(DatakenTask.java:110)
at
com.dataken.tasks.objectcentricprocessor.ObjectCentricProcessTask.execute(ObjectCentricProcessTask.java:99)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
... 1 more




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-with-Spark-latest-1-2-0-build-ClassCastException-from-B-to-SerializableWritable-tp19815.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Issue with Spark latest 1.2.0 build - ClassCastException from [B to SerializableWritable

2014-11-25 Thread lokeshkumar
Hello forum, 

We are using spark distro built from the source of latest 1.2.0 tag. 
And we are facing the below issue, while trying to act upon the JavaRDD
instance, the stacktrace is given below. 
Can anyone please let me know, what can be wrong here? 

java.lang.ClassCastException: [B cannot be cast to
org.apache.spark.SerializableWritable 
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:138) 
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 
at org.apache.spark.rdd.RDD.take(RDD.scala:1060) 
at
org.apache.spark.api.java.JavaRDDLike$class.take(JavaRDDLike.scala:419) 
at org.apache.spark.api.java.JavaRDD.take(JavaRDD.scala:32) 
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) 
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) 
2014-11-26 08:07:38,454 ERROR [DefaultQuartzScheduler_Worker-2]
org.quartz.core.ErrorLogger 
Job (report_report.report_report threw an exception. 

org.quartz.SchedulerException: Job threw an unhandled exception. [See nested
exception: java.lang.ClassCastException: [B cannot be cast to
org.apache.spark.SerializableWritable] 
at org.quartz.core.JobRunShell.run(JobRunShell.java:213) 
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) 
Caused by: java.lang.ClassCastException: [B cannot be cast to
org.apache.spark.SerializableWritable 
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:138) 
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) 
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) 
at org.apache.spark.rdd.RDD.take(RDD.scala:1060) 
at
org.apache.spark.api.java.JavaRDDLike$class.take(JavaRDDLike.scala:419) 
at org.apache.spark.api.java.JavaRDD.take(JavaRDD.scala:32) 
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) 
... 1 more 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-with-Spark-latest-1-2-0-build-ClassCastException-from-B-to-SerializableWritable-tp19824.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



stdout in spark applications

2014-11-04 Thread lokeshkumar
Hi Forum,

I am running a simple spark application in 1 master and 1 worker.
Submitting my application through spark submit as a java program. I have
sysout in the program, but I am not finding these sysouts in stdout/stderr
links in web ui of master as well in the SPARK_HOME/work directory.

Please let me know if I have to change any settings.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/stdout-in-spark-applications-tp18056.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: stdout in spark applications

2014-11-04 Thread lokeshkumar
Got my answer from this thread,
http://apache-spark-user-list.1001560.n3.nabble.com/no-stdout-output-from-worker-td2437.html



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/stdout-in-spark-applications-tp18056p18134.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark MLLIB Decision Tree - ArrayIndexOutOfBounds Exception

2014-10-24 Thread lokeshkumar
Hi Joseph,

Thanks for the help.

I have tried this DecisionTree example with the latest spark code and it is
working fine now. But how do we choose the maxBins for this model?

Thanks
Lokesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLLIB-Decision-Tree-ArrayIndexOutOfBounds-Exception-tp16907p17195.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark MLLIB Decision Tree - ArrayIndexOutOfBounds Exception

2014-10-21 Thread lokeshkumar
Hi All, 

I am trying to run the spark example JavaDecisionTree code using some
external data set. 
It works for certain dataset only with specific maxBins and maxDepth
settings. Even for a working dataset if I add a new data item I get a
ArrayIndexOutOfBounds Exception, I get the same exception for the first case
as well (changing maxBins and maxDepth). I am not sure what is wrong here,
can anyone please explain this. 

Exception stacktrace: 

14/10/21 13:47:15 ERROR executor.Executor: Exception in task 1.0 in stage
7.0 (TID 13) 
java.lang.ArrayIndexOutOfBoundsException: 6301 
at
org.apache.spark.mllib.tree.DecisionTree$.updateBinForOrderedFeature$1(DecisionTree.scala:648)
 
at
org.apache.spark.mllib.tree.DecisionTree$.binaryOrNotCategoricalBinSeqOp$1(DecisionTree.scala:706)
 
at
org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:798)
 
at
org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:830)
 
at
org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:830)
 
at
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
 
at
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
 
at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
at
org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) 
at
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144) 
at
org.apache.spark.InterruptibleIterator.foldLeft(InterruptibleIterator.scala:28) 
at
scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201) 
at
org.apache.spark.InterruptibleIterator.aggregate(InterruptibleIterator.scala:28)
 
at
org.apache.spark.mllib.rdd.RDDFunctions$$anonfun$4.apply(RDDFunctions.scala:99) 
at
org.apache.spark.mllib.rdd.RDDFunctions$$anonfun$4.apply(RDDFunctions.scala:99) 
at
org.apache.spark.mllib.rdd.RDDFunctions$$anonfun$5.apply(RDDFunctions.scala:100)
 
at
org.apache.spark.mllib.rdd.RDDFunctions$$anonfun$5.apply(RDDFunctions.scala:100)
 
at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) 
at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) 
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) 
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) 
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) 
at org.apache.spark.scheduler.Task.run(Task.scala:54) 
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) 
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745) 
14/10/21 13:47:15 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 7.0
(TID 13, localhost): java.lang.ArrayIndexOutOfBoundsException: 6301 
   
org.apache.spark.mllib.tree.DecisionTree$.updateBinForOrderedFeature$1(DecisionTree.scala:648)
 
   
org.apache.spark.mllib.tree.DecisionTree$.binaryOrNotCategoricalBinSeqOp$1(DecisionTree.scala:706)
 
   
org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:798)
 
   
org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:830)
 
   
org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:830)
 
   
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
 
   
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
 
scala.collection.Iterator$class.foreach(Iterator.scala:727) 
   
org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) 
   
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144) 
   
org.apache.spark.InterruptibleIterator.foldLeft(InterruptibleIterator.scala:28) 
   
scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201) 
   
org.apache.spark.InterruptibleIterator.aggregate(InterruptibleIterator.scala:28)
 
   
org.apache.spark.mllib.rdd.RDDFunctions$$anonfun$4.apply(RDDFunctions.scala:99) 
   
org.apache.spark.mllib.rdd.RDDFunctions$$anonfun$4.apply(RDDFunctions.scala:99) 
   
org.apache.spark.mllib.rdd.RDDFunctions$$anonfun$5.apply(RDDFunctions.scala:100)
 
   
org.apache.spark.mllib.rdd.RDDFunctions$$anonfun$5.apply(RDDFunctions.scala:100)
 
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) 
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) 
   
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 

Re: Spark MLLIB Decision Tree - ArrayIndexOutOfBounds Exception

2014-10-21 Thread lokeshkumar
Hi Joseph

I am using spark 1.1.0 the latest version, I will try to update to the
current master and check.

The example I am running is JavaDecisionTree, the dataset is of libsvm
format containing 

1. 45 instances of training sample. 
2. 5 features
3. I am not sure what is feature type, but there are no categorical features
being passed in the example.
4. Three labels, not sure what label type is.

The example runs fine with 100 maxBins as value, but when I change this to
say 50 or 30 I get the exception.
Also could you please let me know what should be the default value for
maxBins(API says 100 as default but it did not work in this case)?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLLIB-Decision-Tree-ArrayIndexOutOfBounds-Exception-tp16907p16988.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org