Re: SparkRDD with Ignite
Vidhya, Are you using Spring Cache? Because that's what this issue is about. I'm not sure how this is related to your use case. -Val -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9354.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: SparkRDD with Ignite
Vidhya, I'm not aware of this issue, but it sound pretty serious. Are there any ticket and/or discussions about this you can point me to? -Val -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9345.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: SparkRDD with Ignite
Thanks for the reply, Val. Lookslike the issue is with having "null" fields in the database. Read that null fields are handled in Ignite 1.8 release. Is that right ? Thanks, Vidhya From: vkulichenko <valentin.kuliche...@gmail.com> Sent: Wednesday, November 30, 2016 3:43 PM To: user@ignite.apache.org Subject: Re: SparkRDD with Ignite It depends on how big your data set, where the data is coming from, etc. I don't think Ignite is a bottleneck here. -Val -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9319.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: SparkRDD with Ignite
It depends on how big your data set, where the data is coming from, etc. I don't think Ignite is a bottleneck here. -Val -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9319.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: SparkRDD with Ignite
Have spark version 1.6.1 with apache-ignite-1.6.0-src. Have spark installed on all 10 nodes, but, have ignite installed on only one node (master). Is that an issue ? Tried the below lines and its been running for the past 20 mins without any errors... scala> val rdd = df2.map(row => (row.getAs[String]("DRIVER_TYPE"), row)) rdd: org.apache.spark.rdd.RDD[(String, org.apache.spark.sql.Row)] = MapPartitionsRDD[9] at map at :42 scala> igniteRDD.savePairs(rdd) Is it expected to take this long. See below in ignite log console [23-11-2016 17:32:30][INFO ][grid-timeout-worker-#65%null%][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=7a4a9bc9, name=null, uptime=02:12:00:684] ^-- H/N/C [hosts=1, nodes=2, CPUs=16] ^-- CPU [cur=0.07%, avg=0.04%, GC=0%] ^-- Heap [used=222MB, free=77.3%, comm=982MB] ^-- Non heap [used=33MB, free=89.04%, comm=33MB] ^-- Public thread pool [active=0, idle=32, qSize=0] ^-- System thread pool [active=0, idle=32, qSize=0] ^-- Outbound messages queue [size=0] [23-11-2016 17:33:30][INFO ][grid-timeout-worker-#65%null%][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=7a4a9bc9, name=null, uptime=02:13:00:684] ^-- H/N/C [hosts=1, nodes=2, CPUs=16] ^-- CPU [cur=0.03%, avg=0.04%, GC=0%] ^-- Heap [used=229MB, free=76.62%, comm=982MB] ^-- Non heap [used=33MB, free=89.03%, comm=33MB] ^-- Public thread pool [active=0, idle=32, qSize=0] ^-- System thread pool [active=0, idle=32, qSize=0] ^-- Outbound messages queue [size=0] [23-11-2016 17:34:30][INFO ][grid-timeout-worker-#65%null%][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=7a4a9bc9, name=null, uptime=02:14:00:685] ^-- H/N/C [hosts=1, nodes=2, CPUs=16] ^-- CPU [cur=0.03%, avg=0.04%, GC=0%] ^-- Heap [used=235MB, free=75.98%, comm=982MB] ^-- Non heap [used=33MB, free=89.03%, comm=33MB] ^-- Public thread pool [active=0, idle=32, qSize=0] ^-- System thread pool [active=0, idle=32, qSize=0] ^-- Outbound messages queue [size=0] [23-11-2016 17:35:30][INFO ][grid-timeout-worker-#65%null%][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=7a4a9bc9, name=null, uptime=02:15:00:687] ^-- H/N/C [hosts=1, nodes=2, CPUs=16] ^-- CPU [cur=0.07%, avg=0.04%, GC=0%] ^-- Heap [used=242MB, free=75.33%, comm=982MB] ^-- Non heap [used=33MB, free=89.03%, comm=33MB] ^-- Public thread pool [active=0, idle=32, qSize=0] ^-- System thread pool [active=0, idle=32, qSize=0] ^-- Outbound messages queue [size=0] [23-11-2016 17:36:30][INFO ][grid-timeout-worker-#65%null%][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=7a4a9bc9, name=null, uptime=02:16:00:691] ^-- H/N/C [hosts=1, nodes=2, CPUs=16] ^-- CPU [cur=0.07%, avg=0.04%, GC=0%] ^-- Heap [used=246MB, free=74.9%, comm=982MB] ^-- Non heap [used=33MB, free=89.03%, comm=33MB] ^-- Public thread pool [active=0, idle=32, qSize=0] ^-- System thread pool [active=0, idle=32, qSize=0] ^-- Outbound messages queue [size=0] Thanks, Vidhya From: vkulichenko <valentin.kuliche...@gmail.com> Sent: Wednesday, November 23, 2016 3:07 PM To: user@ignite.apache.org Subject: Re: SparkRDD with Ignite What version of Scala do you have? Also why are you on Ignite 1.6? Can you switch to 1.7? -Val -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9165.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: SparkRDD with Ignite
What version of Scala do you have? Also why are you on Ignite 1.6? Can you switch to 1.7? -Val -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9165.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: SparkRDD with Ignite
Thanks, Val. Yes, the executor was missing the driver. I added the driver and that eliminated the driver missing warning and now see java.lang.NoSuchMethodError: at at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues. This is a warning though. See Error only on threshold limit exception though. Any inputs here will be of great help. scala> sharedRDD.saveValues(df2.rdd) [15:28:36] Topology snapshot [ver=5, servers=2, clients=1, CPUs=16, heap=3.0GB] 16/11/23 15:28:37 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 5, finact-poc-001): java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:151) at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:150) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:150) at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:138) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [15:28:37] Topology snapshot [ver=6, servers=1, clients=1, CPUs=16, heap=2.0GB] 16/11/23 15:28:37 ERROR TaskSchedulerImpl: Lost executor 7 on finact-poc-001: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/11/23 15:28:46 ERROR TaskSchedulerImpl: Lost executor 11 on finact-poc-004.cisco.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/11/23 15:28:54 WARN TaskSetManager: Lost task 0.2 in stage 2.0 (TID 7, finact-poc-002.cisco.com): java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:151) at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:150) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:150) at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:138) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Thanks, Vidhya From: vkulichenko <valentin.kuliche...@gmail.com> Sent: Wednesday, November 23, 2016 12:18 PM To: user@ignite.apache.org Subject: Re: SparkRDD with Ignite Where is this exception failing? Is it on executor node? Does it work if you execute something like foreachPartition on the original RDD? For now it just looks like the executors just miss the Oracle driver and therefore can't load rows from the database. Ignite is not even touched yet at this point. -Val -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9162.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: SparkRDD with Ignite
Where is this exception failing? Is it on executor node? Does it work if you execute something like foreachPartition on the original RDD? For now it just looks like the executors just miss the Oracle driver and therefore can't load rows from the database. Ignite is not even touched yet at this point. -Val -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SparkRDD-with-Ignite-tp9160p9162.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
SparkRDD with Ignite
Have been extensively trying to integrate spark rdd's with ignite. We have spark version 1.6.1 with apache-ignite-1.6.0-src. Tried out the example in "https://apacheignite-fs.readme.io/docs/testing-integration-with-spark-shell; and it is working fine. However, all efforts w.r.t connecting to spark dataframes/rdd's have been fruitless. Have listed the steps below, Any inputs on what is missing will be of great help. We have a 10 node spark-hadoop cluster. Please suggest if there are any other better approach than converting df's to rdd's and using ignite saveValues ? 1. val df2 = sqlContext.read.jdbc(url, "SAMPLE_DATA", props) // Dataframe that reads Oracle SAMPLE_DATA table. df2.show() returns all the values in table. We are good from spark perspective at this point. 2. import org.apache.spark.rdd.RDD import org.apache.spark.sql.Row val rows: RDD[Row] = df2.rdd // Converting dataframe to rdd's Setting up ignite properties: 1. scala> import org.apache.ignite.spark._ 2. scala> import org.apache.ignite.configuration._ 3. val igniteContext = new IgniteContext[org.apache.spark.sql.Row,org.apache.spark.sql.Row](sc, () => new IgniteConfiguration()) 4. val sharedRDD = igniteContext.fromCache("partitioned") 5. sharedRDD.saveValues(rows: RDD[Row]) After this, hitting the below error.. scala> sharedRDD.saveValues(rows: RDD[Row]) 16/11/23 13:23:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, xxx006.xxx.com): java.lang.IllegalStateException: Did not find registered driver with class oracle.jdbc.OracleDriver at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:347) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [13:24:00] Topology snapshot [ver=28, servers=3, clients=1, CPUs=16, heap=4.0GB] 16/11/23 13:24:01 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1, xxx001): java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:151) at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:150) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:150) at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:138) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [13:24:01] Topology snapshot [ver=29, servers=2, clients=1, CPUs=16, heap=3.0GB] 16/11/23 13:24:02 ERROR TaskSchedulerImpl: Lost executor 3 on xxx001: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/11/23 13:24:05 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job