Re: Error report file is deleted automatically after spark application finished
Thanks for the information. My problem is resolved now . I have one more issue. I am not able to save core dump file. Always shows *“# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again"* I set core dump limit to unlimited in all nodes. Using below settings Edit /etc/security/limits.conf file and add " * soft core unlimited " line. I rechecked using : $ ulimit -all core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 241204 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 241204 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited but when my spark application crash , show error " Failed to write core dump. Core dumps have been disabled. To enablecore dumping, try "ulimit -c unlimited" before starting Java again”. Regards Prateek On Wed, Jun 29, 2016 at 9:30 PM, dhruve ashar <dhruveas...@gmail.com> wrote: > You can look at the yarn-default configuration file. > > Check your log related settings to see if log aggregation is enabled or > also the log retention duration to see if its too small and files are being > deleted. > > On Wed, Jun 29, 2016 at 4:47 PM, prateek arora <prateek.arora...@gmail.com > > wrote: > >> >> Hi >> >> My Spark application was crashed and show information >> >> LogType:stdout >> Log Upload Time:Wed Jun 29 14:38:03 -0700 2016 >> LogLength:1096 >> Log Contents: >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGILL (0x4) at pc=0x7f67baa0d221, pid=12207, tid=140083473176320 >> # >> # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build >> 1.7.0_67-b01) >> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode >> linux-amd64 compressed oops) >> # Problematic frame: >> # C [libcaffe.so.1.0.0-rc3+0x786221] sgemm_kernel+0x21 >> # >> # Failed to write core dump. Core dumps have been disabled. To enable core >> dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # >> >> /yarn/nm/usercache/ubuntu/appcache/application_1467236060045_0001/container_1467236060045_0001_01_03/hs_err_pid12207.log >> >> >> >> but I am not able to found >> >> "/yarn/nm/usercache/ubuntu/appcache/application_1467236060045_0001/container_1467236060045_0001_01_03/hs_err_pid12207.log" >> file . its deleted automatically after Spark application >> finished >> >> >> how to retain report file , i am running spark with yarn . >> >> Regards >> Prateek >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Error-report-file-is-deleted-automatically-after-spark-application-finished-tp27247.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > > -- > -Dhruve Ashar > >
Error report file is deleted automatically after spark application finished
Hi My Spark application was crashed and show information LogType:stdout Log Upload Time:Wed Jun 29 14:38:03 -0700 2016 LogLength:1096 Log Contents: # # A fatal error has been detected by the Java Runtime Environment: # # SIGILL (0x4) at pc=0x7f67baa0d221, pid=12207, tid=140083473176320 # # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libcaffe.so.1.0.0-rc3+0x786221] sgemm_kernel+0x21 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /yarn/nm/usercache/ubuntu/appcache/application_1467236060045_0001/container_1467236060045_0001_01_03/hs_err_pid12207.log but I am not able to found "/yarn/nm/usercache/ubuntu/appcache/application_1467236060045_0001/container_1467236060045_0001_01_03/hs_err_pid12207.log" file . its deleted automatically after Spark application finished how to retain report file , i am running spark with yarn . Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-report-file-is-deleted-automatically-after-spark-application-finished-tp27247.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: How to enable core dump in spark
hi I am using spark with yarn . how can i make sure that the ulimit settings are applied to the Spark process ? I set core dump limit to unlimited in all nodes . Edit /etc/security/limits.conf file and add " * soft core unlimited " line. i rechecked using : $ ulimit -all core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 241204 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 241204 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Regards Prateek On Thu, Jun 16, 2016 at 4:46 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > Can you make sure that the ulimit settings are applied to the Spark > process? Is this Spark on YARN or Standalone? > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Wed, Jun 1, 2016 at 7:55 PM, prateek arora > <prateek.arora...@gmail.com> wrote: > > Hi > > > > I am using cloudera to setup spark 1.6.0 on ubuntu 14.04 . > > > > I set core dump limit to unlimited in all nodes . > >Edit /etc/security/limits.conf file and add " * soft core unlimited > " > > line. > > > > i rechecked using : $ ulimit -all > > > > core file size (blocks, -c) unlimited > > data seg size (kbytes, -d) unlimited > > scheduling priority (-e) 0 > > file size (blocks, -f) unlimited > > pending signals (-i) 241204 > > max locked memory (kbytes, -l) 64 > > max memory size (kbytes, -m) unlimited > > open files (-n) 1024 > > pipe size(512 bytes, -p) 8 > > POSIX message queues (bytes, -q) 819200 > > real-time priority (-r) 0 > > stack size (kbytes, -s) 8192 > > cpu time (seconds, -t) unlimited > > max user processes (-u) 241204 > > virtual memory (kbytes, -v) unlimited > > file locks (-x) unlimited > > > > but when I am running my spark application with some third party native > > libraries . but it crashes some time and show error " Failed to write > core > > dump. Core dumps have been disabled. To enable core dumping, try "ulimit > -c > > unlimited" before starting Java again " . > > > > Below are the log : > > > > A fatal error has been detected by the Java Runtime Environment: > > # > > # SIGSEGV (0xb) at pc=0x7fd44b491fb9, pid=20458, tid=140549318547200 > > # > > # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build > > 1.7.0_67-b01) > > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode > > linux-amd64 compressed oops) > > # Problematic frame: > > # V [libjvm.so+0x650fb9] jni_SetByteArrayRegion+0xa9 > > # > > # Failed to write core dump. Core dumps have been disabled. To enable > core > > dumping, try "ulimit -c unlimited" before starting Java again > > # > > # An error report file with more information is saved as: > > # > > > /yarn/nm/usercache/master/appcache/application_1462930975871_0004/container_1462930975871_0004_01_66/hs_err_pid20458.log > > # > > # If you would like to submit a bug report, please visit: > > # http://bugreport.sun.com/bugreport/crash.jsp > > # > > > > > > so how can i enable core dump and save it some place ? > > > > Regards > > Prateek > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-core-dump-in-spark-tp27065.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > >
Re: How to enable core dump in spark
please help me to solve my problem Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-core-dump-in-spark-tp27065p27081.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
How to enable core dump in spark
Hi I am using cloudera to setup spark 1.6.0 on ubuntu 14.04 . I set core dump limit to unlimited in all nodes . Edit /etc/security/limits.conf file and add " * soft core unlimited " line. i rechecked using : $ ulimit -all core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 241204 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 241204 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited but when I am running my spark application with some third party native libraries . but it crashes some time and show error " Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again " . Below are the log : A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7fd44b491fb9, pid=20458, tid=140549318547200 # # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x650fb9] jni_SetByteArrayRegion+0xa9 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /yarn/nm/usercache/master/appcache/application_1462930975871_0004/container_1462930975871_0004_01_66/hs_err_pid20458.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # so how can i enable core dump and save it some place ? Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-core-dump-in-spark-tp27065.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to get and save core dump of native library in executors
Please help to solve my problem . Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945p26967.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to get and save core dump of native library in executors
I am running my cluster on Ubuntu 14.04 Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945p26952.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to get and save core dump of native library in executors
ubuntu 14.04 On Thu, May 12, 2016 at 2:40 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Which OS are you using ? > > See http://en.linuxreviews.org/HOWTO_enable_core-dumps > > On Thu, May 12, 2016 at 2:23 PM, prateek arora <prateek.arora...@gmail.com > > wrote: > >> Hi >> >> I am running my spark application with some third party native libraries . >> but it crashes some time and show error " Failed to write core dump. Core >> dumps have been disabled. To enable core dumping, try "ulimit -c >> unlimited" >> before starting Java again " . >> >> Below are the log : >> >> A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x7fd44b491fb9, pid=20458, tid=140549318547200 >> # >> # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build >> 1.7.0_67-b01) >> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode >> linux-amd64 compressed oops) >> # Problematic frame: >> # V [libjvm.so+0x650fb9] jni_SetByteArrayRegion+0xa9 >> # >> # Failed to write core dump. Core dumps have been disabled. To enable core >> dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # >> >> /yarn/nm/usercache/master/appcache/application_1462930975871_0004/container_1462930975871_0004_01_66/hs_err_pid20458.log >> # >> # If you would like to submit a bug report, please visit: >> # http://bugreport.sun.com/bugreport/crash.jsp >> # >> >> >> so how can i enable core dump and save it some place ? >> >> Regards >> Prateek >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
How to get and save core dump of native library in executors
Hi I am running my spark application with some third party native libraries . but it crashes some time and show error " Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again " . Below are the log : A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7fd44b491fb9, pid=20458, tid=140549318547200 # # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x650fb9] jni_SetByteArrayRegion+0xa9 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /yarn/nm/usercache/master/appcache/application_1462930975871_0004/container_1462930975871_0004_01_66/hs_err_pid20458.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # so how can i enable core dump and save it some place ? Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
spark 1.6 : RDD Partitions not distributed evenly to executors
Hi My Spark Streaming application receiving data from one kafka topic ( one partition) and rdd have 30 partition. but scheduler schedule the task between executors running on same host with NODE_LOCAL locality level. ( where kafka topic partition created) . Below are the logs : 16/05/06 11:21:38 INFO YarnScheduler: Adding task set 1.0 with 30 tasks 16/05/06 11:21:38 DEBUG TaskSetManager: Epoch for TaskSet 1.0: 1 16/05/06 11:21:38 DEBUG TaskSetManager: Valid locality levels for TaskSet 1.0: NODE_LOCAL, RACK_LOCAL, ANY 16/05/06 11:21:38 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, ivcp-m04.novalocal, partition 0,NODE_LOCAL, 2248 bytes) 16/05/06 11:21:38 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, ivcp-m04.novalocal, partition 1,NODE_LOCAL, 2248 bytes) 16/05/06 11:21:38 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3, ivcp-m04.novalocal, partition 2,NODE_LOCAL, 2248 bytes) 16/05/06 11:21:38 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4, ivcp-m04.novalocal, partition 3,NODE_LOCAL, 2248 bytes) 16/05/06 11:21:38 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5, ivcp-m04.novalocal, partition 4,NODE_LOCAL, 2248 bytes) I have seen this scenario after upgrading my spark from 1.5.to 1.6 . same application distributed rdd partition evenly to executors in spark 1.5 . As mentioned on some spark developer blogs , I have tried spark.shuffle.reduceLocality.enabled=false and after that my rdd partition is distributed between executors of all host with PROCESS_LOCAL locality level. Below are the logs : 16/05/06 11:24:46 INFO YarnScheduler: Adding task set 1.0 with 30 tasks 16/05/06 11:24:46 DEBUG TaskSetManager: Valid locality levels for TaskSet 1.0: NO_PREF, ANY 16/05/06 11:24:46 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, ivcp-m02.novalocal, partition 0,PROCESS_LOCAL, 2248 bytes) 16/05/06 11:24:46 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, ivcp-m01.novalocal, partition 1,PROCESS_LOCAL, 2248 bytes) 16/05/06 11:24:46 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3, ivcp-m06.novalocal, partition 2,PROCESS_LOCAL, 2248 bytes) 16/05/06 11:24:46 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4, ivcp-m04.novalocal, partition 3,PROCESS_LOCAL, 2248 bytes) 16/05/06 11:24:46 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5, ivcp-m04.novalocal, partition 4,PROCESS_LOCAL, 2248 bytes) Is above configuration is correct solution for problem ? and why spark.shuffle.reduceLocality.enabled not mentioned in spark configuration document ? Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-RDD-Partitions-not-distributed-evenly-to-executors-tp26911.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: is there any way to submit spark application from outside of spark cluster
Hi Thanks for the information . it will definitely solve my problem I have one more question .. if i want to launch a spark application in production environment so is there any other way so multiple users can submit there job without having hadoop configuration . Regards Prateek On Fri, Mar 25, 2016 at 10:50 AM, Ted Yu <yuzhih...@gmail.com> wrote: > See this thread: > > http://search-hadoop.com/m/q3RTtAvwgE7dEI02 > > On Fri, Mar 25, 2016 at 10:39 AM, prateek arora < > prateek.arora...@gmail.com> wrote: > >> Hi >> >> I want to submit spark application from outside of spark clusters . so >> please help me to provide a information regarding this. >> >> Regards >> Prateek >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/is-there-any-way-to-submit-spark-application-from-outside-of-spark-cluster-tp26599.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
is there any way to submit spark application from outside of spark cluster
Hi I want to submit spark application from outside of spark clusters . so please help me to provide a information regarding this. Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-there-any-way-to-submit-spark-application-from-outside-of-spark-cluster-tp26599.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to distribute dependent files (.so , jar ) across spark worker nodes
Hi I do not want create single jar that contains all the other dependencies . because it will increase the size of my spark job jar . so i want to copy all libraries in cluster using some automation process . just like currently i am using chef . but i am not sure is it a right method or not ? Regards Prateek On Mon, Mar 14, 2016 at 2:31 PM, Jakob Odersky <ja...@odersky.com> wrote: > Have you tried setting the configuration > `spark.executor.extraLibraryPath` to point to a location where your > .so's are available? (Not sure if non-local files, such as HDFS, are > supported) > > On Mon, Mar 14, 2016 at 2:12 PM, Tristan Nixon <st...@memeticlabs.org> > wrote: > > What build system are you using to compile your code? > > If you use a dependency management system like maven or sbt, then you > should be able to instruct it to build a single jar that contains all the > other dependencies, including third-party jars and .so’s. I am a maven user > myself, and I use the shade plugin for this: > > https://maven.apache.org/plugins/maven-shade-plugin/ > > > > However, if you are using SBT or another dependency manager, someone > else on this list may be able to give you help on that. > > > > If you’re not using a dependency manager - well, you should be. Trying > to manage this manually is a pain that you do not want to get in the way of > your project. There are perfectly good tools to do this for you; use them. > > > >> On Mar 14, 2016, at 3:56 PM, prateek arora <prateek.arora...@gmail.com> > wrote: > >> > >> Hi > >> > >> Thanks for the information . > >> > >> but my problem is that if i want to write spark application which > depend on > >> third party libraries like opencv then whats is the best approach to > >> distribute all .so and jar file of opencv in all cluster ? > >> > >> Regards > >> Prateek > >> > >> > >> > >> -- > >> View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-distribute-dependent-files-so-jar-across-spark-worker-nodes-tp26464p26489.html > >> Sent from the Apache Spark User List mailing list archive at Nabble.com. > >> > >> - > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> For additional commands, e-mail: user-h...@spark.apache.org > >> > > > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > >
Re: How to distribute dependent files (.so , jar ) across spark worker nodes
Hi Thanks for the information . but my problem is that if i want to write spark application which depend on third party libraries like opencv then whats is the best approach to distribute all .so and jar file of opencv in all cluster ? Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-distribute-dependent-files-so-jar-across-spark-worker-nodes-tp26464p26489.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
How to distribute dependent files (.so , jar ) across spark worker nodes
Hi I have multiple node cluster and my spark jobs depend on a native library (.so files) and some jar files. Can some one please explain what are the best ways to distribute dependent files across nodes? right now i copied dependent files in all nodes using chef tool . Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-distribute-dependent-files-so-jar-across-spark-worker-nodes-tp26464.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark REST API shows Error 503 Service Unavailable
Hi Vikram , As per Cloudera Person : " There is a minor bug with the way the classpath is setup for the Spark HistoryServer in 5.5.0 which causes the observed error when using the REST API (as a result of bad jersey versions (1.9) being included). This will be fixed in CDH and CM 5.5.2 (yet to be released) onwards." On Thu, Dec 17, 2015 at 3:24 PM, Vikram Kone <vikramk...@gmail.com> wrote: > Hi Prateek, > Were you able to figure why this is happening? I'm seeing the same error > on my spark standalone cluster. > > Any pointers anyone? > > On Fri, Dec 11, 2015 at 2:05 PM, prateek arora <prateek.arora...@gmail.com > > wrote: > >> >> >> Hi >> >> I am trying to access Spark Using REST API but got below error : >> >> Command : >> >> curl http://:18088/api/v1/applications >> >> Response: >> >> >> >> >> >> Error 503 Service Unavailable >> >> >> HTTP ERROR 503 >> >> Problem accessing /api/v1/applications. Reason: >> Service Unavailable >> Caused by: >> org.spark-project.jetty.servlet.ServletHolder$1: >> java.lang.reflect.InvocationTargetException >> at >> >> org.spark-project.jetty.servlet.ServletHolder.makeUnavailable(ServletHolder.java:496) >> at >> >> org.spark-project.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:543) >> at >> >> org.spark-project.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:415) >> at >> >> org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:657) >> at >> >> org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) >> at >> >> org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) >> at >> >> org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) >> at >> >> org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) >> at >> >> org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) >> at >> >> org.spark-project.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301) >> at >> >> org.spark-project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) >> at >> >> org.spark-project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) >> at org.spark-project.jetty.server.Server.handle(Server.java:370) >> at >> >> org.spark-project.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) >> at >> >> org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) >> at >> >> org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) >> at >> org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:644) >> at >> >> org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) >> at >> >> org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) >> at >> >> org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) >> at >> >> org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) >> at >> >> org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) >> at >> >> org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >> Method) >> at >> >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >> at >> >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >> at >> >> com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:728) >> at >> >> com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:678) >> at >> >> co
Spark REST API shows Error 503 Service Unavailable
Hi I am trying to access Spark Using REST API but got below error : Command : curl http://:18088/api/v1/applications Response: Error 503 Service Unavailable HTTP ERROR 503 Problem accessing /api/v1/applications. Reason: Service Unavailable Caused by: org.spark-project.jetty.servlet.ServletHolder$1: java.lang.reflect.InvocationTargetException at org.spark-project.jetty.servlet.ServletHolder.makeUnavailable(ServletHolder.java:496) at org.spark-project.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:543) at org.spark-project.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:415) at org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:657) at org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.spark-project.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301) at org.spark-project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.spark-project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.spark-project.jetty.server.Server.handle(Server.java:370) at org.spark-project.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:728) at com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:678) at com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:203) at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:373) at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:556) at javax.servlet.GenericServlet.init(GenericServlet.java:244) at org.spark-project.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:532) ... 22 more Caused by: java.lang.NoSuchMethodError: com.sun.jersey.core.reflection.ReflectionHelper.getOsgiRegistryInstance()Lcom/sun/jersey/core/osgi/OsgiRegistry; at com.sun.jersey.spi.scanning.AnnotationScannerListener$AnnotatedClassVisitor.getClassForName(AnnotationScannerListener.java:217) at com.sun.jersey.spi.scanning.AnnotationScannerListener$AnnotatedClassVisitor.visitEnd(AnnotationScannerListener.java:186) at org.objectweb.asm.ClassReader.accept(Unknown Source) at org.objectweb.asm.ClassReader.accept(Unknown Source) at com.sun.jersey.spi.scanning.AnnotationScannerListener.onProcess(AnnotationScannerListener.java:136) at com.sun.jersey.core.spi.scanning.JarFileScanner.scan(JarFileScanner.java:97) at com.sun.jersey.core.spi.scanning.uri.JarZipSchemeScanner$1.f(JarZipSchemeScanner.java:78) at com.sun.jersey.core.util.Closing.f(Closing.java:71) at com.sun.jersey.core.spi.scanning.uri.JarZipSchemeScanner.scan(JarZipSchemeScanner.java:75) at com.sun.jersey.core.spi.scanning.PackageNamesScanner.scan(PackageNamesScanner.java:223) at com.sun.jersey.core.spi.scanning.PackageNamesScanner.scan(PackageNamesScanner.java:139) at
can i process multiple batch in parallel in spark streaming
Hi when i run my spark streaming application .. following information show on application streaming UI. i am using spark 1.5.0 Batch Time Input Size Scheduling Delay (?) Processing Time (?) Status 2015/12/09 11:00:42 107 events - - queued 2015/12/09 11:00:41 103 events - - queued 2015/12/09 11:00:40 107 events - - queued 2015/12/09 11:00:39 105 events - - queued 2015/12/09 11:00:38 109 events - - queued 2015/12/09 11:00:37 106 events - - queued 2015/12/09 11:00:36 109 events - - queued 2015/12/09 11:00:35 113 events - - queued 2015/12/09 11:00:34 109 events - - queued 2015/12/09 11:00:33 107 events - - queued 2015/12/09 11:00:32 99 events 42 s- processing it seems batches push into queue and work like FIFO manner . is it possible all my Active batches start processing in parallel. Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/can-i-process-multiple-batch-in-parallel-in-spark-streaming-tp25653.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: can i process multiple batch in parallel in spark streaming
Hi Thanks In my scenario batches are independent .so is it safe to use in production environment ? Regards Prateek On Wed, Dec 9, 2015 at 11:39 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Have you seen this thread ? > > http://search-hadoop.com/m/q3RTtgSGrobJ3Je > > On Wed, Dec 9, 2015 at 11:12 AM, prateek arora <prateek.arora...@gmail.com > > wrote: > >> Hi >> >> when i run my spark streaming application .. following information show on >> application streaming UI. >> i am using spark 1.5.0 >> >> >> Batch Time Input Size Scheduling Delay (?) Processing Time >> (?) >> Status >> 2015/12/09 11:00:42 107 events - - >>queued >> 2015/12/09 11:00:41 103 events - - >>queued >> 2015/12/09 11:00:40 107 events - - >>queued >> 2015/12/09 11:00:39 105 events - - >>queued >> 2015/12/09 11:00:38 109 events - - >>queued >> 2015/12/09 11:00:37 106 events - - >>queued >> 2015/12/09 11:00:36 109 events - - >>queued >> 2015/12/09 11:00:35 113 events - - >>queued >> 2015/12/09 11:00:34 109 events - - >>queued >> 2015/12/09 11:00:33 107 events - - >>queued >> 2015/12/09 11:00:32 99 events 42 s- >>processing >> >> >> >> it seems batches push into queue and work like FIFO manner . is it >> possible >> all my Active batches start processing in parallel. >> >> Regards >> Prateek >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/can-i-process-multiple-batch-in-parallel-in-spark-streaming-tp25653.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
can i write only RDD transformation into hdfs or any other storage system
Hi Is it possible into spark to write only RDD transformation into hdfs or any other storage system ? Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/can-i-write-only-RDD-transformation-into-hdfs-or-any-other-storage-system-tp25637.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
is Multiple Spark Contexts is supported in spark 1.5.0 ?
Hi I want to create multiple sparkContext in my application. i read so many articles they suggest " usage of multiple contexts is discouraged, since SPARK-2243 is still not resolved." i want to know that Is spark 1.5.0 supported to create multiple contexts without error ? and if supported then are we need to set "spark.driver.allowMultipleContexts" configuration parameter ? Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-Multiple-Spark-Contexts-is-supported-in-spark-1-5-0-tp25568.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: is Multiple Spark Contexts is supported in spark 1.5.0 ?
Hi Ted Thanks for the information . is there any way that two different spark application share there data ? Regards Prateek On Fri, Dec 4, 2015 at 9:54 AM, Ted Yu <yuzhih...@gmail.com> wrote: > See Josh's response in this thread: > > > http://search-hadoop.com/m/q3RTt1z1hUw4TiG1=Re+Question+about+yarn+cluster+mode+and+spark+driver+allowMultipleContexts > > Cheers > > On Fri, Dec 4, 2015 at 9:46 AM, prateek arora <prateek.arora...@gmail.com> > wrote: > >> Hi >> >> I want to create multiple sparkContext in my application. >> i read so many articles they suggest " usage of multiple contexts is >> discouraged, since SPARK-2243 is still not resolved." >> i want to know that Is spark 1.5.0 supported to create multiple contexts >> without error ? >> and if supported then are we need to set >> "spark.driver.allowMultipleContexts" configuration parameter ? >> >> Regards >> Prateek >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/is-Multiple-Spark-Contexts-is-supported-in-spark-1-5-0-tp25568.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: is Multiple Spark Contexts is supported in spark 1.5.0 ?
Thanks ... Is there any way my second application run in parallel and wait for fetching data from hbase or any other data storeage system ? Regards Prateek On Fri, Dec 4, 2015 at 10:24 AM, Ted Yu <yuzhih...@gmail.com> wrote: > How about using NoSQL data store such as HBase :-) > > On Fri, Dec 4, 2015 at 10:17 AM, prateek arora <prateek.arora...@gmail.com > > wrote: > >> Hi Ted >> Thanks for the information . >> is there any way that two different spark application share there data ? >> >> Regards >> Prateek >> >> On Fri, Dec 4, 2015 at 9:54 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> See Josh's response in this thread: >>> >>> >>> http://search-hadoop.com/m/q3RTt1z1hUw4TiG1=Re+Question+about+yarn+cluster+mode+and+spark+driver+allowMultipleContexts >>> >>> Cheers >>> >>> On Fri, Dec 4, 2015 at 9:46 AM, prateek arora < >>> prateek.arora...@gmail.com> wrote: >>> >>>> Hi >>>> >>>> I want to create multiple sparkContext in my application. >>>> i read so many articles they suggest " usage of multiple contexts is >>>> discouraged, since SPARK-2243 is still not resolved." >>>> i want to know that Is spark 1.5.0 supported to create multiple contexts >>>> without error ? >>>> and if supported then are we need to set >>>> "spark.driver.allowMultipleContexts" configuration parameter ? >>>> >>>> Regards >>>> Prateek >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/is-Multiple-Spark-Contexts-is-supported-in-spark-1-5-0-tp25568.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> - >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >
how to spark streaming application start working on next batch before completing on previous batch .
Hi I am using spark streaming with Kafka. spark version is 1.5.0 and batch interval is 1 sec. In my scenario , algorithm take 7-10 sec to process 1 batch period data. so after completing previous batch , spark streaming start processing on next batch. i want that my spark streaming application start working on next batch before completing on previous batch . means batches will execute in parallel. please help me to solve this problem. Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-spark-streaming-application-start-working-on-next-batch-before-completing-on-previous-batch-tp25559.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: how can evenly distribute my records in all partition
Hi Thanks for the help. In my Case ... I want to perform operation on 30 record per second using spark streaming. and difference between key of records is around 33-34 ms and my RDD that have 30 records already have 4 partition. and right now my algo take around 400 ms to perform operation on 1 record . so i want to distribute my records evenly so every executor perform operation only on one record and my 1 second batch will be completed without delay. On Tue, Nov 17, 2015 at 7:50 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > Think about how you want to distribute your data and how your keys are > spread currently. Do you want to compute something per day, per week etc. > Based on that, return a partition number. You could use mod 30 or some such > function to get the partitions. > On Nov 18, 2015 5:17 AM, "prateek arora" <prateek.arora...@gmail.com> > wrote: > >> Hi >> I am trying to implement custom partitioner using this link >> http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where >> ( in link example key value is from 0 to (noOfElement - 1)) >> >> but not able to understand how i implement custom partitioner in my >> case: >> >> my parent RDD have 4 partition and RDD key is : TimeStamp and Value is >> JPEG Byte Array >> >> >> Regards >> Prateek >> >> >> On Tue, Nov 17, 2015 at 9:28 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Please take a look at the following for example: >>> >>> ./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala >>> ./core/src/main/scala/org/apache/spark/Partitioner.scala >>> >>> Cheers >>> >>> On Tue, Nov 17, 2015 at 9:24 AM, prateek arora < >>> prateek.arora...@gmail.com> wrote: >>> >>>> Hi >>>> Thanks >>>> I am new in spark development so can you provide some help to write a >>>> custom partitioner to achieve this. >>>> if you have and link or example to write custom partitioner please >>>> provide to me. >>>> >>>> On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan < >>>> sabarish.sasidha...@manthan.com> wrote: >>>> >>>>> You can write your own custom partitioner to achieve this >>>>> >>>>> Regards >>>>> Sab >>>>> On 17-Nov-2015 1:11 am, "prateek arora" <prateek.arora...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> I have a RDD with 30 record ( Key/value pair ) and running 30 >>>>>> executor . i >>>>>> want to reparation this RDD in to 30 partition so every partition >>>>>> get one >>>>>> record and assigned to one executor . >>>>>> >>>>>> when i used rdd.repartition(30) its repartition my rdd in 30 >>>>>> partition but >>>>>> some partition get 2 record , some get 1 record and some not getting >>>>>> any >>>>>> record . >>>>>> >>>>>> is there any way in spark so i can evenly distribute my record in all >>>>>> partition . >>>>>> >>>>>> Regards >>>>>> Prateek >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html >>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>>> - >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>> >>>>>> >>>> >>> >>
Re: how can evenly distribute my records in all partition
Hi Thanks I am new in spark development so can you provide some help to write a custom partitioner to achieve this. if you have and link or example to write custom partitioner please provide to me. On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan < sabarish.sasidha...@manthan.com> wrote: > You can write your own custom partitioner to achieve this > > Regards > Sab > On 17-Nov-2015 1:11 am, "prateek arora" <prateek.arora...@gmail.com> > wrote: > >> Hi >> >> I have a RDD with 30 record ( Key/value pair ) and running 30 executor . i >> want to reparation this RDD in to 30 partition so every partition get one >> record and assigned to one executor . >> >> when i used rdd.repartition(30) its repartition my rdd in 30 partition but >> some partition get 2 record , some get 1 record and some not getting any >> record . >> >> is there any way in spark so i can evenly distribute my record in all >> partition . >> >> Regards >> Prateek >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>
Re: how can evenly distribute my records in all partition
Hi I am trying to implement custom partitioner using this link http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where ( in link example key value is from 0 to (noOfElement - 1)) but not able to understand how i implement custom partitioner in my case: my parent RDD have 4 partition and RDD key is : TimeStamp and Value is JPEG Byte Array Regards Prateek On Tue, Nov 17, 2015 at 9:28 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Please take a look at the following for example: > > ./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala > ./core/src/main/scala/org/apache/spark/Partitioner.scala > > Cheers > > On Tue, Nov 17, 2015 at 9:24 AM, prateek arora <prateek.arora...@gmail.com > > wrote: > >> Hi >> Thanks >> I am new in spark development so can you provide some help to write a >> custom partitioner to achieve this. >> if you have and link or example to write custom partitioner please >> provide to me. >> >> On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan < >> sabarish.sasidha...@manthan.com> wrote: >> >>> You can write your own custom partitioner to achieve this >>> >>> Regards >>> Sab >>> On 17-Nov-2015 1:11 am, "prateek arora" <prateek.arora...@gmail.com> >>> wrote: >>> >>>> Hi >>>> >>>> I have a RDD with 30 record ( Key/value pair ) and running 30 executor >>>> . i >>>> want to reparation this RDD in to 30 partition so every partition get >>>> one >>>> record and assigned to one executor . >>>> >>>> when i used rdd.repartition(30) its repartition my rdd in 30 partition >>>> but >>>> some partition get 2 record , some get 1 record and some not getting any >>>> record . >>>> >>>> is there any way in spark so i can evenly distribute my record in all >>>> partition . >>>> >>>> Regards >>>> Prateek >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> - >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >> >
how can evenly distribute my records in all partition
Hi I have a RDD with 30 record ( Key/value pair ) and running 30 executor . i want to reparation this RDD in to 30 partition so every partition get one record and assigned to one executor . when i used rdd.repartition(30) its repartition my rdd in 30 partition but some partition get 2 record , some get 1 record and some not getting any record . is there any way in spark so i can evenly distribute my record in all partition . Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Sprk RDD : want to combine elements that have approx same keys
Hi In my scenario : I have rdd with key/value pair . i want to combine elements that have approx same keys. like (144,value)(143,value)(142,value)...(214,value)(213,value)(212,value)(313,value)(314,value)... i want to combine elements that have key 144.143,142... means keys have +-2 range same with 214,213,212 keys and so on. how can i do this regards prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Sprk-RDD-want-to-combine-elements-that-have-approx-same-keys-tp24644.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
get java.io.FileNotFoundException when use addFile Function
I am trying to write a simple program using addFile Function but getting error in my worker node that file doest not exist tage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, slave2.novalocal): java.io.FileNotFoundException: File file:/tmp/spark-2791415c-8c20-4920-b3cd-5a6b8b6f3f8d/userFiles-a5a98f06-2d38-48 76-8c8c-82a10ac5431f/csv_ip.csv does not exist code are as below : val sc = new SparkContext(sparkConf) val inputFile =file:///home/ubuntu/test/Spark_CSV/spark_csv_job/csv_ip.csv sc.addFile(inputFile) val inFile = sc.textFile(file://+SparkFiles.get(csv_ip.csv)) inFile.take(10).foreach(println) please help me resolve error. Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/get-java-io-FileNotFoundException-when-use-addFile-Function-tp23867.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
get java.io.FileNotFoundException when use addFile Function
Hi I am trying to write a simple program using addFile Function but getting error in my worker node that file doest not exist tage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, slave2.novalocal): java.io.FileNotFoundException: File file:/tmp/spark-2791415c-8c20-4920-b3cd-5a6b8b6f3f8d/userFiles-a5a98f06-2d38-48 76-8c8c-82a10ac5431f/csv_ip.csv does not exist code are as below : val sc = new SparkContext(sparkConf) val inputFile =file:///home/ubuntu/test/Spark_CSV/spark_csv_job/csv_ip.csv sc.addFile(inputFile) val inFile = sc.textFile(file://+SparkFiles.get(csv_ip.csv)) inFile.take(10).foreach(println) please help me resolve error. Thanks in advance. Regards prateek
Re: connector for CouchDB
I am also looking for connector for CouchDB in Spark. did you find anything ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21422.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
spark connector for CouchDB
i am looking for the spark connector for Couch DB please help me . -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-connector-for-CouchDB-tp21421.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: connector for CouchDB
yes please but i am new for spark and couchdb . -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21428.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: connector for CouchDB
I can also switch to the mongodb if spark have a support for the. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21429.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org