Re: Spark Job not using all nodes in cluster

2015-05-20 Thread Shailesh Birari
No. I am not setting the number of executors anywhere (in env file or in
program).

Is it due to large number of small files ?

On Wed, May 20, 2015 at 5:11 PM, ayan guha  wrote:

> What is your spark env file says? Are you setting number of executors in
> spark context?
> On 20 May 2015 13:16, "Shailesh Birari"  wrote:
>
>> Hi,
>>
>> I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB
>> of RAM.
>> I have around 600,000+ Json files on HDFS. Each file is small around 1KB
>> in
>> size. Total data is around 16GB. Hadoop block size is 256MB.
>> My application reads these files with sc.textFile() (or sc.jsonFile()
>> tried
>> both) API. But all the files are getting read by only one node (4
>> executors). Spark UI shows all 600K+ tasks on one node and 0 on other
>> nodes.
>>
>> I confirmed that all files are accessible from all nodes. Some other
>> application which uses big files uses all nodes on same cluster.
>>
>> Can you please let me know why it is behaving in such way ?
>>
>> Thanks,
>>   Shailesh
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-not-using-all-nodes-in-cluster-tp22951.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>


Spark Job not using all nodes in cluster

2015-05-19 Thread Shailesh Birari
Hi,

I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB
of RAM.
I have around 600,000+ Json files on HDFS. Each file is small around 1KB in
size. Total data is around 16GB. Hadoop block size is 256MB.
My application reads these files with sc.textFile() (or sc.jsonFile()  tried
both) API. But all the files are getting read by only one node (4
executors). Spark UI shows all 600K+ tasks on one node and 0 on other nodes.

I confirmed that all files are accessible from all nodes. Some other
application which uses big files uses all nodes on same cluster.

Can you please let me know why it is behaving in such way ?

Thanks,
  Shailesh




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-not-using-all-nodes-in-cluster-tp22951.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark SQL Self join with agreegate

2015-03-19 Thread Shailesh Birari
Hello,

I want to use Spark sql to aggregate some columns of the data.
e.g. I have huge data with some columns as:
 time, src, dst, val1, val2

I want to calculate sum(val1) and sum(val2) for all unique pairs of src and
dst.

I tried by forming SQL query 
  SELECT a.time, a.src, a.dst, sum(a.val1), sum(a.val2) from table a, table
b where a.src = b.src and a.dst = b.dst

I know I am doing something wrong here.

Can you please let me know is it doable and how ?

Thanks,
  Shailesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Self-join-with-agreegate-tp22151.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.2 – How to change Default (Random) port ….

2015-03-15 Thread Shailesh Birari
Hi SM,

Apologize for delayed response. 
No, the issue is with Spark 1.2.0. There is a bug in Spark 1.2.0.
Recently Spark have latest 1.3.0 release so it might have fixed in it.
I am not planning to test it soon, may be after some time.
You can try for it.

Regards,
  Shailesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-How-to-change-Default-Random-port-tp21306p22063.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.2 – How to change Default (Random) port ….

2015-01-26 Thread Shailesh Birari
Thanks. But after setting "spark.shuffle.blockTransferService" to "nio"
application fails with Akka Client disassociation.

15/01/27 13:38:11 ERROR TaskSchedulerImpl: Lost executor 3 on
wynchcs218.wyn.cnw.co.nz: remote Akka client disassociated
15/01/27 13:38:11 INFO TaskSetManager: Re-queueing tasks for 3 from TaskSet
0.0
15/01/27 13:38:11 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 7,
wynchcs218.wyn.cnw.co.nz): ExecutorLostFailure (executor lost)
15/01/27 13:38:11 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times;
aborting job
15/01/27 13:38:11 WARN TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6,
wynchcs218.wyn.cnw.co.nz): ExecutorLostFailure (executor lost)
15/01/27 13:38:11 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool
15/01/27 13:38:11 INFO TaskSchedulerImpl: Cancelling stage 0
15/01/27 13:38:11 INFO DAGScheduler: Failed to run count at
RowMatrix.scala:71
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 0.0 (TID 7, wynchcs218.wyn.cnw.co.nz):
ExecutorLostFailure (executor lost)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/01/27 13:38:11 INFO DAGScheduler: Executor lost: 3 (epoch 3)
15/01/27 13:38:11 INFO BlockManagerMasterActor: Trying to remove executor 3
from BlockManagerMaster.
15/01/27 13:38:11 INFO BlockManagerMaster: Removed 3 successfully in
removeExecutor



On Mon, Jan 26, 2015 at 6:34 PM, Aaron Davidson  wrote:

> This was a regression caused by Netty Block Transfer Service. The fix for
> this just barely missed the 1.2 release, and you can see the associated
> JIRA here: https://issues.apache.org/jira/browse/SPARK-4837
>
> Current master has the fix, and the Spark 1.2.1 release will have it
> included. If you don't want to rebuild from master or wait, then you can
> turn it off by setting "spark.shuffle.blockTransferService" to "nio".
>
> On Sun, Jan 25, 2015 at 6:28 PM, Shailesh Birari 
> wrote:
>
>> Can anyone please let me know ?
>> I don't want to open all ports on n/w. So, am interested in the property
>> by
>> which this new port I can configure.
>>
>>   Shailesh
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-How-to-change-Default-Random-port-tp21306p21360.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Spark 1.2 – How to change Default (Random) port ….

2015-01-25 Thread Shailesh Birari
Can anyone please let me know ?
I don't want to open all ports on n/w. So, am interested in the property by
which this new port I can configure.

  Shailesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-How-to-change-Default-Random-port-tp21306p21360.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark 1.2 – How to change Default (Random) port ….

2015-01-21 Thread Shailesh Birari
Hello,

Recently, I have upgraded my setup to Spark 1.2 from Spark 1.1.

I have 4 node Ubuntu Spark Cluster.
With Spark 1.1, I used to write Spark Scala program in Eclipse on my Windows
development host and submit the job on Ubuntu Cluster, from Eclipse (Windows
machine).

As on my network not all ports between Spark cluster and development machine
are open, I set spark process ports to valid ports. 
On Spark 1.1 this works perfectly.

When I try to run the same program with same user defined ports on Spark 1.2
cluster it gives me connection time out for port *56117*.

I referred the Spark 1.2 configuration page
(http://spark.apache.org/docs/1.2.0/configuration.html) but there are no new
ports mentioned.

*Here is my code for reference:*
   
val conf = new SparkConf()
.setMaster(sparkMaster)
.setAppName("Spark SVD")

.setSparkHome("/usr/local/spark")
.setJars(jars)
  .set("spark.driver.host", 
"consb2a")  //Windows host
(Development machine)
.set("spark.driver.port", 
"51810")
.set("spark.fileserver.port", 
"51811")
.set("spark.broadcast.port", 
"51812")

.set("spark.replClassServer.port", "51813")
.set("spark.blockManager.port", 
"51814")
.set("spark.executor.port", 
"51815")
.set("spark.executor.memory", 
"2g")
.set("spark.driver.memory", 
"4g")
val sc = new SparkContext(conf)

*Here is Exception:*
15/01/21 15:44:08 INFO BlockManagerMasterActor: Registering block manager
wynchcs217.wyn.cnw.co.nz:37173 with 1059.9 MB RAM, BlockManagerId(2,
wynchcs217.wyn.cnw.co.nz, 37173)
15/01/21 15:44:08 INFO BlockManagerMasterActor: Registering block manager
wynchcs219.wyn.cnw.co.nz:53850 with 1059.9 MB RAM, BlockManagerId(1,
wynchcs219.wyn.cnw.co.nz, 53850)
15/01/21 15:44:08 INFO BlockManagerMasterActor: Registering block manager
wynchcs220.wyn.cnw.co.nz:35670 with 1060.3 MB RAM, BlockManagerId(0,
wynchcs220.wyn.cnw.co.nz, 35670)
15/01/21 15:44:08 INFO BlockManagerMasterActor: Registering block manager
wynchcs218.wyn.cnw.co.nz:46890 with 1059.9 MB RAM, BlockManagerId(3,
wynchcs218.wyn.cnw.co.nz, 46890)
15/01/21 15:52:23 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
wynchcs217.wyn.cnw.co.nz): java.io.IOException: Connecting to
CONSB2A.cnw.co.nz/143.96.130.27:56117 timed out (12 ms)
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:188)
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
at
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at
org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at
org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)

15/01/21 15:52:23 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID
2, wynchcs220.wyn.cnw.co.nz, NODE_LOCAL, 1366 bytes)
15/01/21 15:55:35 INFO TaskSchedulerImpl: Cancelling stage 0
15/01/21 15:55:35 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/01/21 15:55:35 INFO DAGScheduler: Job 0 failed: count at
RowMatrix.scala:76, took 689.331309 s
Exception in thread "main" org.apache.spark.SparkException: Job 0 cancelled
because Stage 0 was cancelled


Can you please let me know how can I define the port 56117 to some other
port ?

Thanks,
  Shailesh





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-How-to-change-Default-Random-port-tp21306.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.2 - com/google/common/base/Preconditions java.lang.NoClassDefFoundErro

2015-01-20 Thread Shailesh Birari
Thanks Aaron.

Adding Guava jar resolves the issue.

Shailesh

On Wed, Jan 21, 2015 at 3:26 PM, Aaron Davidson  wrote:

> Spark's network-common package depends on guava as a "provided" dependency
> in order to avoid conflicting with other libraries (e.g., Hadoop) that
> depend on specific versions. com/google/common/base/Preconditions has been
> present in Guava since version 2, so this is likely a "dependency not
> found" rather than "wrong version of dependency" issue.
>
> To resolve this, please depend on some version of Guava (14.0.1 is
> guaranteed to work, as should any other version from the past few years).
>
> On Tue, Jan 20, 2015 at 6:16 PM, Shailesh Birari 
> wrote:
>
>> Hi Frank,
>>
>> Its a normal eclipse project where I added Scala and Spark libraries as
>> user libraries.
>> Though, I am not attaching any hadoop libraries, in my application code I
>> have following line.
>>
>>   System.setProperty("hadoop.home.dir", "C:\\SB\\HadoopWin")
>>
>> This Hadoop home dir contains "winutils.exe" only.
>>
>> Don't think that its an issue.
>>
>> Please suggest.
>>
>> Thanks,
>>   Shailesh
>>
>>
>> On Wed, Jan 21, 2015 at 2:19 PM, Frank Austin Nothaft <
>> fnoth...@berkeley.edu> wrote:
>>
>>> Shailesh,
>>>
>>> To add, are you packaging Hadoop in your app? Hadoop will pull in Guava.
>>> Not sure if you are using Maven (or what) to build, but if you can pull up
>>> your builds dependency tree, you will likely find com.google.guava being
>>> brought in by one of your dependencies.
>>>
>>> Regards,
>>>
>>> Frank Austin Nothaft
>>> fnoth...@berkeley.edu
>>> fnoth...@eecs.berkeley.edu
>>> 202-340-0466
>>>
>>> On Jan 20, 2015, at 5:13 PM, Shailesh Birari 
>>> wrote:
>>>
>>> Hello,
>>>
>>> I double checked the libraries. I am linking only with Spark 1.2.
>>> Along with Spark 1.2 jars I have Scala 2.10 jars and JRE 7 jars linked
>>> and nothing else.
>>>
>>> Thanks,
>>>Shailesh
>>>
>>> On Wed, Jan 21, 2015 at 12:58 PM, Sean Owen  wrote:
>>>
>>>> Guava is shaded in Spark 1.2+. It looks like you are mixing versions
>>>> of Spark then, with some that still refer to unshaded Guava. Make sure
>>>> you are not packaging Spark with your app and that you don't have
>>>> other versions lying around.
>>>>
>>>> On Tue, Jan 20, 2015 at 11:55 PM, Shailesh Birari 
>>>> wrote:
>>>> > Hello,
>>>> >
>>>> > I recently upgraded my setup from Spark 1.1 to Spark 1.2.
>>>> > My existing applications are working fine on ubuntu cluster.
>>>> > But, when I try to execute Spark MLlib application from Eclipse
>>>> (Windows
>>>> > node) it gives java.lang.NoClassDefFoundError:
>>>> > com/google/common/base/Preconditions exception.
>>>> >
>>>> > Note,
>>>> >1. With Spark 1.1 this was working fine.
>>>> >2. The Spark 1.2 jar files are linked in Eclipse project.
>>>> >3. Checked the jar -tf output and found the above
>>>> com.google.common.base
>>>> > is not present.
>>>> >
>>>> >
>>>> -Exception
>>>> > log:
>>>> >
>>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>> > com/google/common/base/Preconditions
>>>> > at
>>>> >
>>>> org.apache.spark.network.client.TransportClientFactory.(TransportClientFactory.java:94)
>>>> > at
>>>> >
>>>> org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:77)
>>>> > at
>>>> >
>>>> org.apache.spark.network.netty.NettyBlockTransferService.init(NettyBlockTransferService.scala:62)
>>>> > at
>>>> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:194)
>>>> > at
>>>> org.apache.spark.SparkContext.(SparkContext.scala:340)
>>>> > at
>>>> >
>>>> org.apache.spark.examples.mllib.TallSkinnySVD$.main(TallSkinnySVD.scala:74)
>>>> > at
>>&

Re: Spark 1.2 - com/google/common/base/Preconditions java.lang.NoClassDefFoundErro

2015-01-20 Thread Shailesh Birari
Hi Frank,

Its a normal eclipse project where I added Scala and Spark libraries as
user libraries.
Though, I am not attaching any hadoop libraries, in my application code I
have following line.

  System.setProperty("hadoop.home.dir", "C:\\SB\\HadoopWin")

This Hadoop home dir contains "winutils.exe" only.

Don't think that its an issue.

Please suggest.

Thanks,
  Shailesh


On Wed, Jan 21, 2015 at 2:19 PM, Frank Austin Nothaft  wrote:

> Shailesh,
>
> To add, are you packaging Hadoop in your app? Hadoop will pull in Guava.
> Not sure if you are using Maven (or what) to build, but if you can pull up
> your builds dependency tree, you will likely find com.google.guava being
> brought in by one of your dependencies.
>
> Regards,
>
> Frank Austin Nothaft
> fnoth...@berkeley.edu
> fnoth...@eecs.berkeley.edu
> 202-340-0466
>
> On Jan 20, 2015, at 5:13 PM, Shailesh Birari  wrote:
>
> Hello,
>
> I double checked the libraries. I am linking only with Spark 1.2.
> Along with Spark 1.2 jars I have Scala 2.10 jars and JRE 7 jars linked and
> nothing else.
>
> Thanks,
>Shailesh
>
> On Wed, Jan 21, 2015 at 12:58 PM, Sean Owen  wrote:
>
>> Guava is shaded in Spark 1.2+. It looks like you are mixing versions
>> of Spark then, with some that still refer to unshaded Guava. Make sure
>> you are not packaging Spark with your app and that you don't have
>> other versions lying around.
>>
>> On Tue, Jan 20, 2015 at 11:55 PM, Shailesh Birari 
>> wrote:
>> > Hello,
>> >
>> > I recently upgraded my setup from Spark 1.1 to Spark 1.2.
>> > My existing applications are working fine on ubuntu cluster.
>> > But, when I try to execute Spark MLlib application from Eclipse (Windows
>> > node) it gives java.lang.NoClassDefFoundError:
>> > com/google/common/base/Preconditions exception.
>> >
>> > Note,
>> >1. With Spark 1.1 this was working fine.
>> >2. The Spark 1.2 jar files are linked in Eclipse project.
>> >3. Checked the jar -tf output and found the above
>> com.google.common.base
>> > is not present.
>> >
>> >
>> -Exception
>> > log:
>> >
>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> > com/google/common/base/Preconditions
>> > at
>> >
>> org.apache.spark.network.client.TransportClientFactory.(TransportClientFactory.java:94)
>> > at
>> >
>> org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:77)
>> > at
>> >
>> org.apache.spark.network.netty.NettyBlockTransferService.init(NettyBlockTransferService.scala:62)
>> > at
>> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:194)
>> > at org.apache.spark.SparkContext.(SparkContext.scala:340)
>> > at
>> >
>> org.apache.spark.examples.mllib.TallSkinnySVD$.main(TallSkinnySVD.scala:74)
>> > at
>> org.apache.spark.examples.mllib.TallSkinnySVD.main(TallSkinnySVD.scala)
>> > Caused by: java.lang.ClassNotFoundException:
>> > com.google.common.base.Preconditions
>> > at java.net.URLClassLoader$1.run(Unknown Source)
>> > at java.net.URLClassLoader$1.run(Unknown Source)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at java.net.URLClassLoader.findClass(Unknown Source)
>> > at java.lang.ClassLoader.loadClass(Unknown Source)
>> > at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
>> > at java.lang.ClassLoader.loadClass(Unknown Source)
>> > ... 7 more
>> >
>> >
>> -
>> >
>> > jar -tf output:
>> >
>> >
>> > consb2@CONSB2A
>> > /cygdrive/c/SB/spark-1.2.0-bin-hadoop2.4/spark-1.2.0-bin-hadoop2.4/lib
>> > $ jar -tf spark-assembly-1.2.0-hadoop2.4.0.jar | grep Preconditions
>> > org/spark-project/guava/common/base/Preconditions.class
>> > org/spark-project/guava/common/math/MathPreconditions.class
>> > com/clearspring/analytics/util/Preconditions.class
>> > parquet/Preconditions.class
>> > com/google/inject/internal/util/$Preconditions.class
>> >
>> >
>> ---
>> >
>> > Please help me in resolving this.
>> >
>> > Thanks,
>> >   Shailesh
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-com-google-common-base-Preconditions-java-lang-NoClassDefFoundErro-tp21271.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com
>> .
>> >
>> > -
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>> >
>>
>
>
>


Re: Spark 1.2 - com/google/common/base/Preconditions java.lang.NoClassDefFoundErro

2015-01-20 Thread Shailesh Birari
Hello,

I double checked the libraries. I am linking only with Spark 1.2.
Along with Spark 1.2 jars I have Scala 2.10 jars and JRE 7 jars linked and
nothing else.

Thanks,
   Shailesh

On Wed, Jan 21, 2015 at 12:58 PM, Sean Owen  wrote:

> Guava is shaded in Spark 1.2+. It looks like you are mixing versions
> of Spark then, with some that still refer to unshaded Guava. Make sure
> you are not packaging Spark with your app and that you don't have
> other versions lying around.
>
> On Tue, Jan 20, 2015 at 11:55 PM, Shailesh Birari 
> wrote:
> > Hello,
> >
> > I recently upgraded my setup from Spark 1.1 to Spark 1.2.
> > My existing applications are working fine on ubuntu cluster.
> > But, when I try to execute Spark MLlib application from Eclipse (Windows
> > node) it gives java.lang.NoClassDefFoundError:
> > com/google/common/base/Preconditions exception.
> >
> > Note,
> >1. With Spark 1.1 this was working fine.
> >2. The Spark 1.2 jar files are linked in Eclipse project.
> >3. Checked the jar -tf output and found the above
> com.google.common.base
> > is not present.
> >
> >
> -Exception
> > log:
> >
> > Exception in thread "main" java.lang.NoClassDefFoundError:
> > com/google/common/base/Preconditions
> > at
> >
> org.apache.spark.network.client.TransportClientFactory.(TransportClientFactory.java:94)
> > at
> >
> org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:77)
> > at
> >
> org.apache.spark.network.netty.NettyBlockTransferService.init(NettyBlockTransferService.scala:62)
> > at
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:194)
> > at org.apache.spark.SparkContext.(SparkContext.scala:340)
> > at
> >
> org.apache.spark.examples.mllib.TallSkinnySVD$.main(TallSkinnySVD.scala:74)
> > at
> org.apache.spark.examples.mllib.TallSkinnySVD.main(TallSkinnySVD.scala)
> > Caused by: java.lang.ClassNotFoundException:
> > com.google.common.base.Preconditions
> > at java.net.URLClassLoader$1.run(Unknown Source)
> > at java.net.URLClassLoader$1.run(Unknown Source)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(Unknown Source)
> > at java.lang.ClassLoader.loadClass(Unknown Source)
> > at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> > at java.lang.ClassLoader.loadClass(Unknown Source)
> > ... 7 more
> >
> >
> -
> >
> > jar -tf output:
> >
> >
> > consb2@CONSB2A
> > /cygdrive/c/SB/spark-1.2.0-bin-hadoop2.4/spark-1.2.0-bin-hadoop2.4/lib
> > $ jar -tf spark-assembly-1.2.0-hadoop2.4.0.jar | grep Preconditions
> > org/spark-project/guava/common/base/Preconditions.class
> > org/spark-project/guava/common/math/MathPreconditions.class
> > com/clearspring/analytics/util/Preconditions.class
> > parquet/Preconditions.class
> > com/google/inject/internal/util/$Preconditions.class
> >
> >
> ---
> >
> > Please help me in resolving this.
> >
> > Thanks,
> >   Shailesh
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-com-google-common-base-Preconditions-java-lang-NoClassDefFoundErro-tp21271.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>


Spark 1.2 - com/google/common/base/Preconditions java.lang.NoClassDefFoundErro

2015-01-20 Thread Shailesh Birari
Hello,

I recently upgraded my setup from Spark 1.1 to Spark 1.2.
My existing applications are working fine on ubuntu cluster.
But, when I try to execute Spark MLlib application from Eclipse (Windows
node) it gives java.lang.NoClassDefFoundError:
com/google/common/base/Preconditions exception.

Note, 
   1. With Spark 1.1 this was working fine.
   2. The Spark 1.2 jar files are linked in Eclipse project.
   3. Checked the jar -tf output and found the above com.google.common.base
is not present.

-Exception
log:

Exception in thread "main" java.lang.NoClassDefFoundError:
com/google/common/base/Preconditions
at
org.apache.spark.network.client.TransportClientFactory.(TransportClientFactory.java:94)
at
org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:77)
at
org.apache.spark.network.netty.NettyBlockTransferService.init(NettyBlockTransferService.scala:62)
at 
org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:194)
at org.apache.spark.SparkContext.(SparkContext.scala:340)
at
org.apache.spark.examples.mllib.TallSkinnySVD$.main(TallSkinnySVD.scala:74)
at 
org.apache.spark.examples.mllib.TallSkinnySVD.main(TallSkinnySVD.scala)
Caused by: java.lang.ClassNotFoundException:
com.google.common.base.Preconditions
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 7 more

-

jar -tf output:


consb2@CONSB2A
/cygdrive/c/SB/spark-1.2.0-bin-hadoop2.4/spark-1.2.0-bin-hadoop2.4/lib
$ jar -tf spark-assembly-1.2.0-hadoop2.4.0.jar | grep Preconditions
org/spark-project/guava/common/base/Preconditions.class
org/spark-project/guava/common/math/MathPreconditions.class
com/clearspring/analytics/util/Preconditions.class
parquet/Preconditions.class
com/google/inject/internal/util/$Preconditions.class

---

Please help me in resolving this.

Thanks,
  Shailesh




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-com-google-common-base-Preconditions-java-lang-NoClassDefFoundErro-tp21271.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Submitting Spark job on Unix cluster from dev environment (Windows)

2014-11-09 Thread Shailesh Birari
Have you tried to set the host name/port to your Windows machine ?
Also specify the following ports for Spark. Make sure the ports you
mentioned should not be blocked (on windows machine).

spark.fileserver.port
spark.broadcast.port
spark.replClassServer.port
spark.blockManager.port
spark.executor.port

Hope this helps.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-tp16989p18436.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL takes unexpected time

2014-11-03 Thread Shailesh Birari
Yes, I am using Spark1.1.0 and have used rdd.registerTempTable().
I tried by adding sqlContext.cacheTable(), but it took 59 seconds (more than
earlier).

I also tried by changing schema to use Long data type in some fields but
seems conversion takes more time. 
Is there any way to specify index ?  Though I checked and didn't found any,
just want to confirm.

For your reference here is the snippet of code.

-
case class EventDataTbl(EventUID: Long, 
ONum: Long,
RNum: Long,
Timestamp: java.sql.Timestamp,
Duration: String,
Type: String,
Source: String,
OName: String,
RName: String)

val format = new java.text.SimpleDateFormat("-MM-dd 
hh:mm:ss")
val cedFileName = 
"hdfs://hadoophost:8020/demo/poc/JoinCsv/output_2"
val cedRdd = sc.textFile(cedFileName).map(_.split(",", 
-1)).map(p =>
EventDataTbl(p(0).toLong, p(1).toLong, p(2).toLong, new
java.sql.Timestamp(format.parse(p(3)).getTime()), p(4), p(5), p(6), p(7),
p(8)))

cedRdd.registerTempTable("EventDataTbl")
sqlCntxt.cacheTable("EventDataTbl")

val t1 = System.nanoTime()
println("\n\n10 Most frequent conversations between the 
Originators and
Recipients\n")
sql("SELECT COUNT(*) AS Frequency,ONum,OName,RNum,RName FROM 
EventDataTbl
GROUP BY ONum,OName,RNum,RName ORDER BY Frequency DESC LIMIT
10").collect().foreach(println)
val t2 = System.nanoTime()
println("Time taken " + (t2-t1)/10.0 + " Seconds")

-

Thanks,
  Shailesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-takes-unexpected-time-tp17925p18017.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark SQL takes unexpected time

2014-11-02 Thread Shailesh Birari
Hello,

I have written an Spark SQL application which reads data from HDFS  and
query on it.
The data size is around 2GB (30 million records). The schema and query I am
running is as below.
The query takes around 05+ seconds to execute. 
I tried by adding 
   rdd.persist(StorageLevel.MEMORY_AND_DISK)
and
   rdd.cache()
but in both the cases it takes extra time, even if I give the below query as
second the data. (assuming Spark will cache it for first query).

case class EventDataTbl(ID: String, 
ONum: String,
RNum: String,
Timestamp: String,
Duration: String,
Type: String,
Source: String,
OName: String,
RName: String)

sql("SELECT COUNT(*) AS Frequency,ONum,OName,RNum,RName FROM EventDataTbl
GROUP BY ONum,OName,RNum,RName ORDER BY Frequency DESC LIMIT
10").collect().foreach(println)

Can you let me know if I am missing anything ?

Thanks,
  Shailesh




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-takes-unexpected-time-tp17925.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Submitting Spark job on Unix cluster from dev environment (Windows)

2014-10-29 Thread Shailesh Birari
Thanks by setting driver host to Windows and specifying some ports (like
driver, fileserver, broadcast etc..) it worked perfectly. I need to specify
those ports as not all ports are open on my machine.

For, driver host name, I was assuming Spark should get it, as in case of
linux we are not setting it. I was thinking its usable only in case we want
to set driver host other than the machine from which we are running the
program.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-tp16989p17693.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Submiting Spark application through code

2014-10-28 Thread Shailesh Birari
Yes, this is doable.
I am submitting the Spark job using 
JavaSparkContext spark = new JavaSparkContext(sparkMaster,
"app name", System.getenv("SPARK_HOME"),
new String[] {"application JAR"});

To run this first you have to create the application jar and in above API
specify its absolute path.
That's all. Run your java application like any other. 

  Shailesh




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submiting-Spark-application-through-code-tp17452p17553.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Submitting Spark job on Unix cluster from dev environment (Windows)

2014-10-28 Thread Shailesh Birari
Can anyone please help me here ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-tp16989p17552.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Submitting Spark job on cluster from dev environment

2014-10-27 Thread Shailesh Birari
Some more update.

Now, I tried with by setting  spark.driver.host to Spark Master node and
spark.driver.port to 51800 (available open port), but its failing with bind
error. I was hoping that it will start the driver on supplied host:port and
as its unix node there should not be any issue.
Can you please suggest me where I am doing wrong ?

Here is stack trace obtained in Eclipse ...

14/10/28 17:13:37 INFO Remoting: Starting remoting
Exception in thread "main" org.jboss.netty.channel.ChannelException: Failed
to bind to: wynchcs220.wyn.cnw.co.nz/143.96.25.30:51800
at 
org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at
akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391)
at
akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.net.BindException: Cannot assign requested address: bind
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Unknown Source)
at sun.nio.ch.Net.bind(Unknown Source)
at sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source)
at sun.nio.ch.ServerSocketAdaptor.bind(Unknown Source)
at
org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
at
org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
14/10/28 17:13:37 ERROR Remoting: Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at
akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:194)
at
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
at
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1446)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1442)
at 
org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:153)
at org.apache.spark.SparkContext.(SparkContext.scala:203)
at
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:53)
at org.wyn.RemoteWordCount.main(RemoteWordCount.java:51)
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to:
wynchcs220.wyn.cnw.co.nz/143.96.25.30:51800
at

Re: Submitting Spark job on cluster from dev environment

2014-10-27 Thread Shailesh Birari
Hello,

I am able to submit Job on Spark cluster from Windows desktop. But the
executors are not able to run.
When I check the Spark UI (which is on Windows, as Driver is there) it shows
me JAVA_HOME, CLASS_PATH and other environment variables related to Windows.
I tried by setting spark.executor.extraClassPath to Unix classpath,
spark.getConf().setExecutorEnv("spark.executorEnv.JAVA_HOME",
"/usr/lib/jvm/java-6-openjdk-amd64")  within program but it didn't helped.

The executor gives below mentioned error. Can you please let me know what I
am missing ?

/14/10/28 09:36:00 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1563)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:125)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
... 4 more/


Thanks,
  Shailesh





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-cluster-from-dev-environment-tp16989p17397.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming - How to write RDD's in same directory ?

2014-10-21 Thread Shailesh Birari
Thanks Sameer for quick reply.

I will try to implement it.

  Shailesh




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-How-to-write-RDD-s-in-same-directory-tp16962p16970.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark Streaming - How to write RDD's in same directory ?

2014-10-21 Thread Shailesh Birari
Hello,

Spark 1.1.0, Hadoop 2.4.1

I have written a Spark streaming application. And I am getting
FileAlreadyExistsException for rdd.saveAsTextFile(outputFolderPath).
Here is brief what I am is trying to do.
My application is creating text file stream using Java Stream context. The
input file is on HDFS.

JavaDStream textStream = ssc.textFileStream(InputFile);

Then it is comparing each line of input stream with some data and filtering
it. The filtered data I am storing in JavaDStream.

 JavaDStream suspectedStream= textStream.flatMap(new
FlatMapFunction(){
@Override
public Iterable call(String line) throws
Exception {

List filteredList = new ArrayList();

// doing filter job

return filteredList;
}

And this filteredList I am storing in HDFS as:

 suspectedStream.foreach(new
Function,Void>(){
@Override
public Void call(JavaRDD rdd) throws
Exception {
rdd.saveAsTextFile(outputFolderPath);
return null;
}});


But with this I am receiving 
org.apache.hadoop.mapred.FileAlreadyExistsException.

I tried with appending random number with outputFolderPath and its working. 
But my requirement is to collect all output in one directory. 

Can you please suggest if there is any way to get rid of this exception ?

Thanks,
  Shailesh




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-How-to-write-RDD-s-in-same-directory-tp16962.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: java.lang.OutOfMemoryError while running SVD MLLib example

2014-09-25 Thread Shailesh Birari
Hi Xianguri,

After setting SVD to smaller value (200) its working.

Thanks,
  Shailesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-while-running-SVD-MLLib-example-tp14972p15179.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: java.lang.OutOfMemoryError while running SVD MLLib example

2014-09-24 Thread Shailesh Birari

Note, the data is random numbers (double).
Any suggestions/pointers will be highly appreciated.

Thanks,
  Shailesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-while-running-SVD-MLLib-example-tp14972p15083.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org