Re: SparkRDD with Ignite

2016-12-01 Thread vkulichenko
Vidhya,

Are you using Spring Cache? Because that's what this issue is about. I'm not
sure how this is related to your use case.

-Val



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9354.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: SparkRDD with Ignite

2016-12-01 Thread vkulichenko
Vidhya,

I'm not aware of this issue, but it sound pretty serious. Are there any
ticket and/or discussions about this you can point me to?

-Val



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9345.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: SparkRDD with Ignite

2016-11-30 Thread Vidhya Gurumoorthi (vgurumoo)
Thanks for the reply, Val.

Lookslike the issue is with having "null" fields in the database. Read that 
null fields are handled in Ignite 1.8 release. Is that right ?


Thanks,
Vidhya

From: vkulichenko <valentin.kuliche...@gmail.com>
Sent: Wednesday, November 30, 2016 3:43 PM
To: user@ignite.apache.org
Subject: Re: SparkRDD with Ignite

It depends on how big your data set, where the data is coming from, etc. I
don't think Ignite is a bottleneck here.

-Val



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9319.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: SparkRDD with Ignite

2016-11-30 Thread vkulichenko
It depends on how big your data set, where the data is coming from, etc. I
don't think Ignite is a bottleneck here.

-Val



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9319.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: SparkRDD with Ignite

2016-11-23 Thread Vidhya Gurumoorthi (vgurumoo)
 Have spark version 1.6.1 with apache-ignite-1.6.0-src.  Have spark installed 
on all 10 nodes, but, have ignite installed on only one node (master). Is that 
an issue ?

Tried the below lines and its been running for the  past 20 mins without any 
errors...

scala> val rdd = df2.map(row => (row.getAs[String]("DRIVER_TYPE"), row)) 
rdd: org.apache.spark.rdd.RDD[(String, org.apache.spark.sql.Row)] = 
MapPartitionsRDD[9] at map at :42

scala> igniteRDD.savePairs(rdd)

Is it expected to take this long.  

See below in ignite log console
[23-11-2016 17:32:30][INFO ][grid-timeout-worker-#65%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=7a4a9bc9, name=null, uptime=02:12:00:684]
^-- H/N/C [hosts=1, nodes=2, CPUs=16]
^-- CPU [cur=0.07%, avg=0.04%, GC=0%]
^-- Heap [used=222MB, free=77.3%, comm=982MB]
^-- Non heap [used=33MB, free=89.04%, comm=33MB]
^-- Public thread pool [active=0, idle=32, qSize=0]
^-- System thread pool [active=0, idle=32, qSize=0]
^-- Outbound messages queue [size=0]
[23-11-2016 17:33:30][INFO ][grid-timeout-worker-#65%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=7a4a9bc9, name=null, uptime=02:13:00:684]
^-- H/N/C [hosts=1, nodes=2, CPUs=16]
^-- CPU [cur=0.03%, avg=0.04%, GC=0%]
^-- Heap [used=229MB, free=76.62%, comm=982MB]
^-- Non heap [used=33MB, free=89.03%, comm=33MB]
^-- Public thread pool [active=0, idle=32, qSize=0]
^-- System thread pool [active=0, idle=32, qSize=0]
^-- Outbound messages queue [size=0]
[23-11-2016 17:34:30][INFO ][grid-timeout-worker-#65%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=7a4a9bc9, name=null, uptime=02:14:00:685]
^-- H/N/C [hosts=1, nodes=2, CPUs=16]
^-- CPU [cur=0.03%, avg=0.04%, GC=0%]
^-- Heap [used=235MB, free=75.98%, comm=982MB]
^-- Non heap [used=33MB, free=89.03%, comm=33MB]
^-- Public thread pool [active=0, idle=32, qSize=0]
^-- System thread pool [active=0, idle=32, qSize=0]
^-- Outbound messages queue [size=0]
[23-11-2016 17:35:30][INFO ][grid-timeout-worker-#65%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=7a4a9bc9, name=null, uptime=02:15:00:687]
^-- H/N/C [hosts=1, nodes=2, CPUs=16]
^-- CPU [cur=0.07%, avg=0.04%, GC=0%]
^-- Heap [used=242MB, free=75.33%, comm=982MB]
^-- Non heap [used=33MB, free=89.03%, comm=33MB]
^-- Public thread pool [active=0, idle=32, qSize=0]
^-- System thread pool [active=0, idle=32, qSize=0]
^-- Outbound messages queue [size=0]
[23-11-2016 17:36:30][INFO ][grid-timeout-worker-#65%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=7a4a9bc9, name=null, uptime=02:16:00:691]
^-- H/N/C [hosts=1, nodes=2, CPUs=16]
^-- CPU [cur=0.07%, avg=0.04%, GC=0%]
^-- Heap [used=246MB, free=74.9%, comm=982MB]
^-- Non heap [used=33MB, free=89.03%, comm=33MB]
^-- Public thread pool [active=0, idle=32, qSize=0]
^-- System thread pool [active=0, idle=32, qSize=0]
^-- Outbound messages queue [size=0]

Thanks,
Vidhya


From: vkulichenko <valentin.kuliche...@gmail.com>
Sent: Wednesday, November 23, 2016 3:07 PM
To: user@ignite.apache.org
Subject: Re: SparkRDD with Ignite

What version of Scala do you have? Also why are you on Ignite 1.6? Can you
switch to 1.7?

-Val



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9165.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: SparkRDD with Ignite

2016-11-23 Thread vkulichenko
What version of Scala do you have? Also why are you on Ignite 1.6? Can you
switch to 1.7?

-Val



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9165.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: SparkRDD with Ignite

2016-11-23 Thread Vidhya Gurumoorthi (vgurumoo)
Thanks, Val. Yes, the executor was missing the driver. I added the driver and 
that eliminated the driver missing warning and now see 
java.lang.NoSuchMethodError: at at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues. This is a warning 
though. See Error only on threshold limit exception though. Any inputs here 
will be of great help. 


scala> sharedRDD.saveValues(df2.rdd)
[15:28:36] Topology snapshot [ver=5, servers=2, clients=1, CPUs=16, heap=3.0GB]
16/11/23 15:28:37 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 5, 
finact-poc-001): java.lang.NoSuchMethodError: 
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:151)
at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:150)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:150)
at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:138)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

[15:28:37] Topology snapshot [ver=6, servers=1, clients=1, CPUs=16, heap=2.0GB]
16/11/23 15:28:37 ERROR TaskSchedulerImpl: Lost executor 7 on finact-poc-001: 
Remote RPC client disassociated. Likely due to containers exceeding thresholds, 
or network issues. Check driver logs for WARN messages.
16/11/23 15:28:46 ERROR TaskSchedulerImpl: Lost executor 11 on 
finact-poc-004.cisco.com: Remote RPC client disassociated. Likely due to 
containers exceeding thresholds, or network issues. Check driver logs for WARN 
messages.
16/11/23 15:28:54 WARN TaskSetManager: Lost task 0.2 in stage 2.0 (TID 7, 
finact-poc-002.cisco.com): java.lang.NoSuchMethodError: 
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:151)
at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:150)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:150)
at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:138)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



Thanks,
Vidhya

From: vkulichenko <valentin.kuliche...@gmail.com>
Sent: Wednesday, November 23, 2016 12:18 PM
To: user@ignite.apache.org
Subject: Re: SparkRDD with Ignite

Where is this exception failing? Is it on executor node?

Does it work if you execute something like foreachPartition on the original
RDD? For now it just looks like the executors just miss the Oracle driver
and therefore can't load rows from the database. Ignite is not even touched
yet at this point.

-Val



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9162.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: SparkRDD with Ignite

2016-11-23 Thread vkulichenko
Where is this exception failing? Is it on executor node?

Does it work if you execute something like foreachPartition on the original
RDD? For now it just looks like the executors just miss the Oracle driver
and therefore can't load rows from the database. Ignite is not even touched
yet at this point.

-Val



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9162.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


SparkRDD with Ignite

2016-11-23 Thread Vidhya Gurumoorthi (vgurumoo)
Have been extensively trying to integrate spark rdd's with ignite. We have 
spark version 1.6.1 with apache-ignite-1.6.0-src. Tried out the example in 
"https://apacheignite-fs.readme.io/docs/testing-integration-with-spark-shell​; 
and it is working fine. However, all efforts w.r.t connecting to spark 
dataframes/rdd's have been fruitless. Have listed the steps below, Any inputs 
on what is missing will be of great help. We have a 10 node spark-hadoop 
cluster.  Please suggest if there are any other better approach than converting 
df's to rdd's and using ignite saveValues ?


1.  val df2 = sqlContext.read.jdbc(url, "SAMPLE_DATA", props)​ // Dataframe 
that reads Oracle SAMPLE_DATA table.

df2.show() returns all the values in table. We are good from spark perspective 
at this point.


2.

import org.apache.spark.rdd.RDD

import org.apache.spark.sql.Row

val rows: RDD[Row] = df2.rdd // Converting dataframe to rdd's


Setting up ignite properties:


1. scala> import org.apache.ignite.spark._

2. scala> import org.apache.ignite.configuration._​

3. val igniteContext = new 
IgniteContext[org.apache.spark.sql.Row,org.apache.spark.sql.Row](sc, () => new 
IgniteConfiguration())

4. val sharedRDD = igniteContext.fromCache("partitioned")​

5. sharedRDD.saveValues(rows: RDD[Row])


After this, hitting the below error..


scala> sharedRDD.saveValues(rows: RDD[Row])

16/11/23 13:23:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
xxx006.xxx.com): java.lang.IllegalStateException: Did not find registered 
driver with class oracle.jdbc.OracleDriver

at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)

at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)

at scala.Option.getOrElse(Option.scala:120)

at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57)

at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)

at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:347)

at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)


[13:24:00] Topology snapshot [ver=28, servers=3, clients=1, CPUs=16, heap=4.0GB]

16/11/23 13:24:01 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1, 
xxx001): java.lang.NoSuchMethodError: 
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;

at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:151)

at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:150)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:150)

at 
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:138)

at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)

at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)

at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)


[13:24:01] Topology snapshot [ver=29, servers=2, clients=1, CPUs=16, heap=3.0GB]

16/11/23 13:24:02 ERROR TaskSchedulerImpl: Lost executor 3 on xxx001: Remote 
RPC client disassociated. Likely due to containers exceeding thresholds, or 
network issues. Check driver logs for WARN messages.


16/11/23 13:24:05 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; 
aborting job