Re: Error report file is deleted automatically after spark application finished

2016-06-30 Thread prateek arora
Thanks for the information. My problem is resolved now .



I have one more issue.



I am not able to save core dump file. Always shows *“# Failed to write core
dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c
unlimited" before starting Java again"*



I set core dump limit to unlimited in all nodes. Using below settings
   Edit /etc/security/limits.conf file and add  " * soft core unlimited "
line.

I rechecked  using :  $ ulimit -all

core file size  (blocks, -c) unlimited
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 241204
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 241204
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

but when my spark application  crash ,  show error " Failed to
write core dump. Core dumps have been disabled. To enablecore dumping, try
"ulimit -c unlimited" before starting Java again”.


Regards

Prateek





On Wed, Jun 29, 2016 at 9:30 PM, dhruve ashar <dhruveas...@gmail.com> wrote:

> You can look at the yarn-default configuration file.
>
> Check your log related settings to see if log aggregation is enabled or
> also the log retention duration to see if its too small and files are being
> deleted.
>
> On Wed, Jun 29, 2016 at 4:47 PM, prateek arora <prateek.arora...@gmail.com
> > wrote:
>
>>
>> Hi
>>
>> My Spark application was crashed and show information
>>
>> LogType:stdout
>> Log Upload Time:Wed Jun 29 14:38:03 -0700 2016
>> LogLength:1096
>> Log Contents:
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGILL (0x4) at pc=0x7f67baa0d221, pid=12207, tid=140083473176320
>> #
>> # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build
>> 1.7.0_67-b01)
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode
>> linux-amd64 compressed oops)
>> # Problematic frame:
>> # C  [libcaffe.so.1.0.0-rc3+0x786221]  sgemm_kernel+0x21
>> #
>> # Failed to write core dump. Core dumps have been disabled. To enable core
>> dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> #
>>
>> /yarn/nm/usercache/ubuntu/appcache/application_1467236060045_0001/container_1467236060045_0001_01_03/hs_err_pid12207.log
>>
>>
>>
>> but I am not able to found
>>
>> "/yarn/nm/usercache/ubuntu/appcache/application_1467236060045_0001/container_1467236060045_0001_01_03/hs_err_pid12207.log"
>> file . its deleted  automatically after Spark application
>>  finished
>>
>>
>> how  to retain report file , i am running spark with yarn .
>>
>> Regards
>> Prateek
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Error-report-file-is-deleted-automatically-after-spark-application-finished-tp27247.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> -Dhruve Ashar
>
>


Error report file is deleted automatically after spark application finished

2016-06-29 Thread prateek arora

Hi

My Spark application was crashed and show information

LogType:stdout
Log Upload Time:Wed Jun 29 14:38:03 -0700 2016
LogLength:1096
Log Contents:
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x7f67baa0d221, pid=12207, tid=140083473176320
#
# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build
1.7.0_67-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# C  [libcaffe.so.1.0.0-rc3+0x786221]  sgemm_kernel+0x21
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
#
/yarn/nm/usercache/ubuntu/appcache/application_1467236060045_0001/container_1467236060045_0001_01_03/hs_err_pid12207.log



but I am not able to found 
"/yarn/nm/usercache/ubuntu/appcache/application_1467236060045_0001/container_1467236060045_0001_01_03/hs_err_pid12207.log"
 
file . its deleted  automatically after Spark application
 finished


how  to retain report file , i am running spark with yarn .

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-report-file-is-deleted-automatically-after-spark-application-finished-tp27247.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: How to enable core dump in spark

2016-06-16 Thread prateek arora
hi

I am using spark with yarn .  how can i  make sure that the ulimit settings
are applied to the Spark process ?

I set core dump limit to unlimited in all nodes .
   Edit  /etc/security/limits.conf file and add  " * soft core unlimited "
line.

i rechecked  using :  $ ulimit -all

core file size  (blocks, -c) unlimited
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 241204
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 241204
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Regards
Prateek


On Thu, Jun 16, 2016 at 4:46 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Can you make sure that the ulimit settings are applied to the Spark
> process? Is this Spark on YARN or Standalone?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Wed, Jun 1, 2016 at 7:55 PM, prateek arora
> <prateek.arora...@gmail.com> wrote:
> > Hi
> >
> > I am using cloudera to  setup spark 1.6.0  on ubuntu 14.04 .
> >
> > I set core dump limit to unlimited in all nodes .
> >Edit  /etc/security/limits.conf file and add  " * soft core unlimited
> "
> > line.
> >
> > i rechecked  using :  $ ulimit -all
> >
> > core file size  (blocks, -c) unlimited
> > data seg size   (kbytes, -d) unlimited
> > scheduling priority (-e) 0
> > file size   (blocks, -f) unlimited
> > pending signals (-i) 241204
> > max locked memory   (kbytes, -l) 64
> > max memory size (kbytes, -m) unlimited
> > open files  (-n) 1024
> > pipe size(512 bytes, -p) 8
> > POSIX message queues (bytes, -q) 819200
> > real-time priority  (-r) 0
> > stack size  (kbytes, -s) 8192
> > cpu time   (seconds, -t) unlimited
> > max user processes  (-u) 241204
> > virtual memory  (kbytes, -v) unlimited
> > file locks  (-x) unlimited
> >
> > but when I am running my spark application with some third party native
> > libraries . but it crashes some time and show error " Failed to write
> core
> > dump. Core dumps have been disabled. To enable core dumping, try "ulimit
> -c
> > unlimited" before starting Java again " .
> >
> > Below are the log :
> >
> >  A fatal error has been detected by the Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x7fd44b491fb9, pid=20458, tid=140549318547200
> > #
> > # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build
> > 1.7.0_67-b01)
> > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode
> > linux-amd64 compressed oops)
> > # Problematic frame:
> > # V  [libjvm.so+0x650fb9]  jni_SetByteArrayRegion+0xa9
> > #
> > # Failed to write core dump. Core dumps have been disabled. To enable
> core
> > dumping, try "ulimit -c unlimited" before starting Java again
> > #
> > # An error report file with more information is saved as:
> > #
> >
> /yarn/nm/usercache/master/appcache/application_1462930975871_0004/container_1462930975871_0004_01_66/hs_err_pid20458.log
> > #
> > # If you would like to submit a bug report, please visit:
> > #   http://bugreport.sun.com/bugreport/crash.jsp
> > #
> >
> >
> > so how can i enable core dump and save it some place ?
> >
> > Regards
> > Prateek
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-core-dump-in-spark-tp27065.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>


Re: How to enable core dump in spark

2016-06-02 Thread prateek arora

please help me to solve my problem

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-core-dump-in-spark-tp27065p27081.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How to enable core dump in spark

2016-06-01 Thread prateek arora
Hi 

I am using cloudera to  setup spark 1.6.0  on ubuntu 14.04 .

I set core dump limit to unlimited in all nodes . 
   Edit  /etc/security/limits.conf file and add  " * soft core unlimited "
line.

i rechecked  using :  $ ulimit -all

core file size  (blocks, -c) unlimited
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 241204
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 241204
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

but when I am running my spark application with some third party native
libraries . but it crashes some time and show error " Failed to write core
dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c
unlimited" before starting Java again " . 

Below are the log : 

 A fatal error has been detected by the Java Runtime Environment: 
# 
#  SIGSEGV (0xb) at pc=0x7fd44b491fb9, pid=20458, tid=140549318547200 
# 
# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build
1.7.0_67-b01) 
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode
linux-amd64 compressed oops) 
# Problematic frame: 
# V  [libjvm.so+0x650fb9]  jni_SetByteArrayRegion+0xa9 
# 
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again 
# 
# An error report file with more information is saved as: 
#
/yarn/nm/usercache/master/appcache/application_1462930975871_0004/container_1462930975871_0004_01_66/hs_err_pid20458.log
 
# 
# If you would like to submit a bug report, please visit: 
#   http://bugreport.sun.com/bugreport/crash.jsp
# 


so how can i enable core dump and save it some place ? 

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-core-dump-in-spark-tp27065.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to get and save core dump of native library in executors

2016-05-16 Thread prateek arora
Please help to solve my problem .

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945p26967.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to get and save core dump of native library in executors

2016-05-13 Thread prateek arora
I am running my cluster on Ubuntu 14.04

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945p26952.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to get and save core dump of native library in executors

2016-05-12 Thread prateek arora
ubuntu 14.04

On Thu, May 12, 2016 at 2:40 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Which OS are you using ?
>
> See http://en.linuxreviews.org/HOWTO_enable_core-dumps
>
> On Thu, May 12, 2016 at 2:23 PM, prateek arora <prateek.arora...@gmail.com
> > wrote:
>
>> Hi
>>
>> I am running my spark application with some third party native libraries .
>> but it crashes some time and show error " Failed to write core dump. Core
>> dumps have been disabled. To enable core dumping, try "ulimit -c
>> unlimited"
>> before starting Java again " .
>>
>> Below are the log :
>>
>>  A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x7fd44b491fb9, pid=20458, tid=140549318547200
>> #
>> # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build
>> 1.7.0_67-b01)
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode
>> linux-amd64 compressed oops)
>> # Problematic frame:
>> # V  [libjvm.so+0x650fb9]  jni_SetByteArrayRegion+0xa9
>> #
>> # Failed to write core dump. Core dumps have been disabled. To enable core
>> dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> #
>>
>> /yarn/nm/usercache/master/appcache/application_1462930975871_0004/container_1462930975871_0004_01_66/hs_err_pid20458.log
>> #
>> # If you would like to submit a bug report, please visit:
>> #   http://bugreport.sun.com/bugreport/crash.jsp
>> #
>>
>>
>> so how can i enable core dump and save it some place ?
>>
>> Regards
>> Prateek
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


How to get and save core dump of native library in executors

2016-05-12 Thread prateek arora
Hi

I am running my spark application with some third party native libraries .
but it crashes some time and show error " Failed to write core dump. Core
dumps have been disabled. To enable core dumping, try "ulimit -c unlimited"
before starting Java again " . 

Below are the log :

 A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7fd44b491fb9, pid=20458, tid=140549318547200
#
# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build
1.7.0_67-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x650fb9]  jni_SetByteArrayRegion+0xa9
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
#
/yarn/nm/usercache/master/appcache/application_1462930975871_0004/container_1462930975871_0004_01_66/hs_err_pid20458.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#


so how can i enable core dump and save it some place ?

Regards
Prateek




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



spark 1.6 : RDD Partitions not distributed evenly to executors

2016-05-09 Thread prateek arora
Hi

My Spark Streaming application receiving data from one kafka  topic ( one
partition) and rdd have 30 partition.  

but scheduler schedule the task between executors  running on  same host
with NODE_LOCAL locality level.  ( where kafka topic partition created) . 

Below are the logs :

16/05/06 11:21:38 INFO YarnScheduler: Adding task set 1.0 with 30 tasks
16/05/06 11:21:38 DEBUG TaskSetManager: Epoch for TaskSet 1.0: 1
16/05/06 11:21:38 DEBUG TaskSetManager: Valid locality levels for TaskSet
1.0: NODE_LOCAL, RACK_LOCAL, ANY
16/05/06 11:21:38 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
1, ivcp-m04.novalocal, partition 0,NODE_LOCAL, 2248 bytes)
16/05/06 11:21:38 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID
2, ivcp-m04.novalocal, partition 1,NODE_LOCAL, 2248 bytes)
16/05/06 11:21:38 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID
3, ivcp-m04.novalocal, partition 2,NODE_LOCAL, 2248 bytes)
16/05/06 11:21:38 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID
4, ivcp-m04.novalocal, partition 3,NODE_LOCAL, 2248 bytes)
16/05/06 11:21:38 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID
5, ivcp-m04.novalocal, partition 4,NODE_LOCAL, 2248 bytes)



I have seen this scenario after upgrading my spark from 1.5.to 1.6 . same
application distributed rdd partition  evenly to executors in spark 1.5 .

As mentioned on some spark developer blogs , I have tried
spark.shuffle.reduceLocality.enabled=false   and after that my rdd partition
is distributed between executors of all host with PROCESS_LOCAL locality
level.

Below are the logs :

16/05/06 11:24:46 INFO YarnScheduler: Adding task set 1.0 with 30 tasks
16/05/06 11:24:46 DEBUG TaskSetManager: Valid locality levels for TaskSet
1.0: NO_PREF, ANY

16/05/06 11:24:46 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
1, ivcp-m02.novalocal, partition 0,PROCESS_LOCAL, 2248 bytes)
16/05/06 11:24:46 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID
2, ivcp-m01.novalocal, partition 1,PROCESS_LOCAL, 2248 bytes)
16/05/06 11:24:46 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID
3, ivcp-m06.novalocal, partition 2,PROCESS_LOCAL, 2248 bytes)
16/05/06 11:24:46 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID
4, ivcp-m04.novalocal, partition 3,PROCESS_LOCAL, 2248 bytes)
16/05/06 11:24:46 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID
5, ivcp-m04.novalocal, partition 4,PROCESS_LOCAL, 2248 bytes)





Is above configuration is correct solution for problem  ? and why
spark.shuffle.reduceLocality.enabled not mentioned in spark configuration
document ?



Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-RDD-Partitions-not-distributed-evenly-to-executors-tp26911.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: is there any way to submit spark application from outside of spark cluster

2016-03-25 Thread prateek arora
Hi

Thanks for the information . it will definitely solve  my problem

I have one more question .. if i want to launch a spark application in
production environment  so is there any other way so multiple users can
submit there job without having  hadoop configuration .

Regards
Prateek


On Fri, Mar 25, 2016 at 10:50 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> See this thread:
>
> http://search-hadoop.com/m/q3RTtAvwgE7dEI02
>
> On Fri, Mar 25, 2016 at 10:39 AM, prateek arora <
> prateek.arora...@gmail.com> wrote:
>
>> Hi
>>
>> I want to submit spark application from outside of spark clusters .   so
>> please help me to provide a information regarding this.
>>
>> Regards
>> Prateek
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/is-there-any-way-to-submit-spark-application-from-outside-of-spark-cluster-tp26599.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


is there any way to submit spark application from outside of spark cluster

2016-03-25 Thread prateek arora
Hi

I want to submit spark application from outside of spark clusters .   so
please help me to provide a information regarding this.

Regards
Prateek




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/is-there-any-way-to-submit-spark-application-from-outside-of-spark-cluster-tp26599.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-14 Thread prateek arora
Hi

I do not want create single jar that contains all the other dependencies .
 because it will increase the size of my spark job jar .
so i want to copy all libraries in cluster using some automation process .
just like currently i am using chef .
but i am not sure is it a right method or not ?


Regards
Prateek


On Mon, Mar 14, 2016 at 2:31 PM, Jakob Odersky <ja...@odersky.com> wrote:

> Have you tried setting the configuration
> `spark.executor.extraLibraryPath` to point to a location where your
> .so's are available? (Not sure if non-local files, such as HDFS, are
> supported)
>
> On Mon, Mar 14, 2016 at 2:12 PM, Tristan Nixon <st...@memeticlabs.org>
> wrote:
> > What build system are you using to compile your code?
> > If you use a dependency management system like maven or sbt, then you
> should be able to instruct it to build a single jar that contains all the
> other dependencies, including third-party jars and .so’s. I am a maven user
> myself, and I use the shade plugin for this:
> > https://maven.apache.org/plugins/maven-shade-plugin/
> >
> > However, if you are using SBT or another dependency manager, someone
> else on this list may be able to give you help on that.
> >
> > If you’re not using a dependency manager - well, you should be. Trying
> to manage this manually is a pain that you do not want to get in the way of
> your project. There are perfectly good tools to do this for you; use them.
> >
> >> On Mar 14, 2016, at 3:56 PM, prateek arora <prateek.arora...@gmail.com>
> wrote:
> >>
> >> Hi
> >>
> >> Thanks for the information .
> >>
> >> but my problem is that if i want to write spark application which
> depend on
> >> third party libraries like opencv then whats is the best approach to
> >> distribute all .so and jar file of opencv in all cluster ?
> >>
> >> Regards
> >> Prateek
> >>
> >>
> >>
> >> --
> >> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-distribute-dependent-files-so-jar-across-spark-worker-nodes-tp26464p26489.html
> >> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>


Re: How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-14 Thread prateek arora
Hi 

Thanks for the information .

but my problem is that if i want to write spark application which depend on
third party libraries like opencv then whats is the best approach to
distribute all .so and jar file of opencv in all cluster ?

Regards
Prateek  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-distribute-dependent-files-so-jar-across-spark-worker-nodes-tp26464p26489.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-11 Thread prateek arora
Hi

I have multiple node cluster and my spark jobs depend on a native
library (.so files) and some jar files.

Can some one please explain what are the best ways to distribute dependent
files across nodes? 

right now i copied  dependent files in all nodes using chef tool .

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-distribute-dependent-files-so-jar-across-spark-worker-nodes-tp26464.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark REST API shows Error 503 Service Unavailable

2015-12-17 Thread prateek arora
Hi Vikram ,

As per Cloudera Person :

" There is a minor bug with the way the classpath is setup for the Spark
HistoryServer in 5.5.0 which causes the observed error when using the REST
API (as a result of bad jersey versions (1.9) being included).

This will be fixed in CDH and CM 5.5.2 (yet to be released) onwards."

On Thu, Dec 17, 2015 at 3:24 PM, Vikram Kone <vikramk...@gmail.com> wrote:

> Hi Prateek,
> Were you able to figure why this is happening? I'm seeing the same error
> on my spark standalone cluster.
>
> Any pointers anyone?
>
> On Fri, Dec 11, 2015 at 2:05 PM, prateek arora <prateek.arora...@gmail.com
> > wrote:
>
>>
>>
>> Hi
>>
>> I am trying to access Spark Using REST API but got below error :
>>
>> Command :
>>
>> curl http://:18088/api/v1/applications
>>
>> Response:
>>
>>
>> 
>> 
>> 
>> Error 503 Service Unavailable
>> 
>> 
>> HTTP ERROR 503
>>
>> Problem accessing /api/v1/applications. Reason:
>> Service Unavailable
>> Caused by:
>> org.spark-project.jetty.servlet.ServletHolder$1:
>> java.lang.reflect.InvocationTargetException
>> at
>>
>> org.spark-project.jetty.servlet.ServletHolder.makeUnavailable(ServletHolder.java:496)
>> at
>>
>> org.spark-project.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:543)
>> at
>>
>> org.spark-project.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:415)
>> at
>>
>> org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:657)
>> at
>>
>> org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
>> at
>>
>> org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
>> at
>>
>> org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
>> at
>>
>> org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
>> at
>>
>> org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> at
>>
>> org.spark-project.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301)
>> at
>>
>> org.spark-project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> at
>>
>> org.spark-project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> at org.spark-project.jetty.server.Server.handle(Server.java:370)
>> at
>>
>> org.spark-project.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
>> at
>>
>> org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
>> at
>>
>> org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
>> at
>> org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:644)
>> at
>>
>> org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>> at
>>
>> org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
>> at
>>
>> org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
>> at
>>
>> org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
>> at
>>
>> org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> at
>>
>> org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> at
>>
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> at
>>
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>> at
>>
>> com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:728)
>> at
>>
>> com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:678)
>> at
>>
>> co

Spark REST API shows Error 503 Service Unavailable

2015-12-11 Thread prateek arora


Hi
 
I am trying to access Spark Using REST API but got below error :

Command :
 
curl http://:18088/api/v1/applications
 
Response:





Error 503 Service Unavailable


HTTP ERROR 503

Problem accessing /api/v1/applications. Reason:
Service Unavailable
Caused by:
org.spark-project.jetty.servlet.ServletHolder$1:
java.lang.reflect.InvocationTargetException
at
org.spark-project.jetty.servlet.ServletHolder.makeUnavailable(ServletHolder.java:496)
at
org.spark-project.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:543)
at
org.spark-project.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:415)
at
org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:657)
at
org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
at
org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at
org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at
org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at
org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.spark-project.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301)
at
org.spark-project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.spark-project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.spark-project.jetty.server.Server.handle(Server.java:370)
at
org.spark-project.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at
org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at
org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at
org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at
org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at
org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
at
org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at
org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:728)
at
com.sun.jersey.spi.container.servlet.WebComponent.createResourceConfig(WebComponent.java:678)
at
com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:203)
at
com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:373)
at
com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:556)
at javax.servlet.GenericServlet.init(GenericServlet.java:244)
at
org.spark-project.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:532)
... 22 more
Caused by: java.lang.NoSuchMethodError:
com.sun.jersey.core.reflection.ReflectionHelper.getOsgiRegistryInstance()Lcom/sun/jersey/core/osgi/OsgiRegistry;
at
com.sun.jersey.spi.scanning.AnnotationScannerListener$AnnotatedClassVisitor.getClassForName(AnnotationScannerListener.java:217)
at
com.sun.jersey.spi.scanning.AnnotationScannerListener$AnnotatedClassVisitor.visitEnd(AnnotationScannerListener.java:186)
at org.objectweb.asm.ClassReader.accept(Unknown Source)
at org.objectweb.asm.ClassReader.accept(Unknown Source)
at
com.sun.jersey.spi.scanning.AnnotationScannerListener.onProcess(AnnotationScannerListener.java:136)
at
com.sun.jersey.core.spi.scanning.JarFileScanner.scan(JarFileScanner.java:97)
at
com.sun.jersey.core.spi.scanning.uri.JarZipSchemeScanner$1.f(JarZipSchemeScanner.java:78)
at com.sun.jersey.core.util.Closing.f(Closing.java:71)
at
com.sun.jersey.core.spi.scanning.uri.JarZipSchemeScanner.scan(JarZipSchemeScanner.java:75)
at
com.sun.jersey.core.spi.scanning.PackageNamesScanner.scan(PackageNamesScanner.java:223)
at
com.sun.jersey.core.spi.scanning.PackageNamesScanner.scan(PackageNamesScanner.java:139)
at

can i process multiple batch in parallel in spark streaming

2015-12-09 Thread prateek arora
Hi

when i run my spark streaming application .. following information show on
application streaming UI.
i am using spark 1.5.0


Batch Time  Input Size   Scheduling Delay (?) Processing Time (?)
Status
2015/12/09 11:00:42 107 events  -   -   
queued  
2015/12/09 11:00:41 103 events  -   -   
queued
2015/12/09 11:00:40 107 events  -   -   
queued
2015/12/09 11:00:39 105 events  -   -   
queued
2015/12/09 11:00:38 109 events  -   -   
queued
2015/12/09 11:00:37 106 events  -   -   
queued
2015/12/09 11:00:36 109 events  -   -   
queued
2015/12/09 11:00:35 113 events  -   -   
queued
2015/12/09 11:00:34 109 events  -   -   
queued
2015/12/09 11:00:33 107 events  -   -   
queued
2015/12/09 11:00:32 99 events   42 s-   
processing



it seems batches push into queue and work like FIFO manner  . is it possible
all my Active batches start processing in parallel.

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/can-i-process-multiple-batch-in-parallel-in-spark-streaming-tp25653.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: can i process multiple batch in parallel in spark streaming

2015-12-09 Thread prateek arora
Hi Thanks

In my scenario batches are independent .so is it safe to use in production
environment ?

Regards
Prateek

On Wed, Dec 9, 2015 at 11:39 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Have you seen this thread ?
>
> http://search-hadoop.com/m/q3RTtgSGrobJ3Je
>
> On Wed, Dec 9, 2015 at 11:12 AM, prateek arora <prateek.arora...@gmail.com
> > wrote:
>
>> Hi
>>
>> when i run my spark streaming application .. following information show on
>> application streaming UI.
>> i am using spark 1.5.0
>>
>>
>> Batch Time  Input Size   Scheduling Delay (?) Processing Time
>> (?)
>> Status
>> 2015/12/09 11:00:42 107 events  -   -
>>queued
>> 2015/12/09 11:00:41 103 events  -   -
>>queued
>> 2015/12/09 11:00:40 107 events  -   -
>>queued
>> 2015/12/09 11:00:39 105 events  -   -
>>queued
>> 2015/12/09 11:00:38 109 events  -   -
>>queued
>> 2015/12/09 11:00:37 106 events  -   -
>>queued
>> 2015/12/09 11:00:36 109 events  -   -
>>queued
>> 2015/12/09 11:00:35 113 events  -   -
>>queued
>> 2015/12/09 11:00:34 109 events  -   -
>>queued
>> 2015/12/09 11:00:33 107 events  -   -
>>queued
>> 2015/12/09 11:00:32 99 events   42 s-
>>processing
>>
>>
>>
>> it seems batches push into queue and work like FIFO manner  . is it
>> possible
>> all my Active batches start processing in parallel.
>>
>> Regards
>> Prateek
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/can-i-process-multiple-batch-in-parallel-in-spark-streaming-tp25653.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


can i write only RDD transformation into hdfs or any other storage system

2015-12-08 Thread prateek arora
Hi

Is it possible into spark to write only RDD transformation into hdfs or any
other storage system ?

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/can-i-write-only-RDD-transformation-into-hdfs-or-any-other-storage-system-tp25637.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



is Multiple Spark Contexts is supported in spark 1.5.0 ?

2015-12-04 Thread prateek arora
Hi

I want to create multiple sparkContext in my application.
i read so many articles they suggest " usage of multiple contexts is
discouraged, since SPARK-2243 is still not resolved."
i want to know that Is spark 1.5.0 supported to create multiple contexts
without error ?
and if supported then are we need to set
"spark.driver.allowMultipleContexts" configuration parameter ?

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/is-Multiple-Spark-Contexts-is-supported-in-spark-1-5-0-tp25568.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: is Multiple Spark Contexts is supported in spark 1.5.0 ?

2015-12-04 Thread prateek arora
Hi Ted
Thanks for the information .
is there any way that two different spark application share there data ?

Regards
Prateek

On Fri, Dec 4, 2015 at 9:54 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> See Josh's response in this thread:
>
>
> http://search-hadoop.com/m/q3RTt1z1hUw4TiG1=Re+Question+about+yarn+cluster+mode+and+spark+driver+allowMultipleContexts
>
> Cheers
>
> On Fri, Dec 4, 2015 at 9:46 AM, prateek arora <prateek.arora...@gmail.com>
> wrote:
>
>> Hi
>>
>> I want to create multiple sparkContext in my application.
>> i read so many articles they suggest " usage of multiple contexts is
>> discouraged, since SPARK-2243 is still not resolved."
>> i want to know that Is spark 1.5.0 supported to create multiple contexts
>> without error ?
>> and if supported then are we need to set
>> "spark.driver.allowMultipleContexts" configuration parameter ?
>>
>> Regards
>> Prateek
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/is-Multiple-Spark-Contexts-is-supported-in-spark-1-5-0-tp25568.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: is Multiple Spark Contexts is supported in spark 1.5.0 ?

2015-12-04 Thread prateek arora
Thanks ...

Is there any way my second application run in parallel and wait for
fetching data from hbase or any other data storeage system ?

Regards
Prateek

On Fri, Dec 4, 2015 at 10:24 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> How about using NoSQL data store such as HBase :-)
>
> On Fri, Dec 4, 2015 at 10:17 AM, prateek arora <prateek.arora...@gmail.com
> > wrote:
>
>> Hi Ted
>> Thanks for the information .
>> is there any way that two different spark application share there data ?
>>
>> Regards
>> Prateek
>>
>> On Fri, Dec 4, 2015 at 9:54 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> See Josh's response in this thread:
>>>
>>>
>>> http://search-hadoop.com/m/q3RTt1z1hUw4TiG1=Re+Question+about+yarn+cluster+mode+and+spark+driver+allowMultipleContexts
>>>
>>> Cheers
>>>
>>> On Fri, Dec 4, 2015 at 9:46 AM, prateek arora <
>>> prateek.arora...@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> I want to create multiple sparkContext in my application.
>>>> i read so many articles they suggest " usage of multiple contexts is
>>>> discouraged, since SPARK-2243 is still not resolved."
>>>> i want to know that Is spark 1.5.0 supported to create multiple contexts
>>>> without error ?
>>>> and if supported then are we need to set
>>>> "spark.driver.allowMultipleContexts" configuration parameter ?
>>>>
>>>> Regards
>>>> Prateek
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/is-Multiple-Spark-Contexts-is-supported-in-spark-1-5-0-tp25568.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>


how to spark streaming application start working on next batch before completing on previous batch .

2015-12-03 Thread prateek arora
Hi

I am using spark streaming with Kafka.  spark version is 1.5.0 and batch
interval  is 1 sec.

In my scenario , algorithm take 7-10 sec to process 1 batch period data. so
after completing previous batch 
, spark streaming start processing on next batch.

i want that my spark streaming application start working on next batch
before completing on previous batch . means batches  will execute in
parallel.

please help me to solve this problem.

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-spark-streaming-application-start-working-on-next-batch-before-completing-on-previous-batch-tp25559.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: how can evenly distribute my records in all partition

2015-11-18 Thread prateek arora
Hi
Thanks for the help.
In my Case ...
I want to perform operation on 30 record per second using spark streaming.
and difference between key of records is around 33-34 ms and my RDD that
have 30 records already have 4 partition.
and right now my algo take around 400 ms to perform operation on 1 record .
so i want to distribute my records evenly so every executor perform
operation only on one record and my 1 second batch will be completed
without delay.


On Tue, Nov 17, 2015 at 7:50 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote:

> Think about how you want to distribute your data and how your keys are
> spread currently. Do you want to compute something per day, per week etc.
> Based on that, return a partition number. You could use mod 30 or some such
> function to get the partitions.
> On Nov 18, 2015 5:17 AM, "prateek arora" <prateek.arora...@gmail.com>
> wrote:
>
>> Hi
>> I am trying to implement custom partitioner using this link
>> http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where
>> ( in link example key value is from 0 to (noOfElement - 1))
>>
>> but not able to understand how i  implement  custom partitioner  in my
>> case:
>>
>> my parent RDD have 4 partition and RDD key is : TimeStamp and Value is
>> JPEG Byte Array
>>
>>
>> Regards
>> Prateek
>>
>>
>> On Tue, Nov 17, 2015 at 9:28 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Please take a look at the following for example:
>>>
>>> ./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala
>>> ./core/src/main/scala/org/apache/spark/Partitioner.scala
>>>
>>> Cheers
>>>
>>> On Tue, Nov 17, 2015 at 9:24 AM, prateek arora <
>>> prateek.arora...@gmail.com> wrote:
>>>
>>>> Hi
>>>> Thanks
>>>> I am new in spark development so can you provide some help to write a
>>>> custom partitioner to achieve this.
>>>> if you have and link or example to write custom partitioner please
>>>> provide to me.
>>>>
>>>> On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan <
>>>> sabarish.sasidha...@manthan.com> wrote:
>>>>
>>>>> You can write your own custom partitioner to achieve this
>>>>>
>>>>> Regards
>>>>> Sab
>>>>> On 17-Nov-2015 1:11 am, "prateek arora" <prateek.arora...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I have a RDD with 30 record ( Key/value pair ) and running 30
>>>>>> executor . i
>>>>>> want to reparation this RDD in to 30 partition so every partition
>>>>>> get one
>>>>>> record and assigned to one executor .
>>>>>>
>>>>>> when i used rdd.repartition(30) its repartition my rdd in 30
>>>>>> partition but
>>>>>> some partition get 2 record , some get 1 record and some not getting
>>>>>> any
>>>>>> record .
>>>>>>
>>>>>> is there any way in spark so i can evenly distribute my record in all
>>>>>> partition .
>>>>>>
>>>>>> Regards
>>>>>> Prateek
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> -
>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>
>>>
>>


Re: how can evenly distribute my records in all partition

2015-11-17 Thread prateek arora
Hi
Thanks
I am new in spark development so can you provide some help to write a
custom partitioner to achieve this.
if you have and link or example to write custom partitioner please provide
to me.

On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan <
sabarish.sasidha...@manthan.com> wrote:

> You can write your own custom partitioner to achieve this
>
> Regards
> Sab
> On 17-Nov-2015 1:11 am, "prateek arora" <prateek.arora...@gmail.com>
> wrote:
>
>> Hi
>>
>> I have a RDD with 30 record ( Key/value pair ) and running 30 executor . i
>> want to reparation this RDD in to 30 partition so every partition  get one
>> record and assigned to one executor .
>>
>> when i used rdd.repartition(30) its repartition my rdd in 30 partition but
>> some partition get 2 record , some get 1 record and some not getting any
>> record .
>>
>> is there any way in spark so i can evenly distribute my record in all
>> partition .
>>
>> Regards
>> Prateek
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>


Re: how can evenly distribute my records in all partition

2015-11-17 Thread prateek arora
Hi
I am trying to implement custom partitioner using this link
http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where
( in link example key value is from 0 to (noOfElement - 1))

but not able to understand how i  implement  custom partitioner  in my case:

my parent RDD have 4 partition and RDD key is : TimeStamp and Value is JPEG
Byte Array


Regards
Prateek


On Tue, Nov 17, 2015 at 9:28 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Please take a look at the following for example:
>
> ./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala
> ./core/src/main/scala/org/apache/spark/Partitioner.scala
>
> Cheers
>
> On Tue, Nov 17, 2015 at 9:24 AM, prateek arora <prateek.arora...@gmail.com
> > wrote:
>
>> Hi
>> Thanks
>> I am new in spark development so can you provide some help to write a
>> custom partitioner to achieve this.
>> if you have and link or example to write custom partitioner please
>> provide to me.
>>
>> On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan <
>> sabarish.sasidha...@manthan.com> wrote:
>>
>>> You can write your own custom partitioner to achieve this
>>>
>>> Regards
>>> Sab
>>> On 17-Nov-2015 1:11 am, "prateek arora" <prateek.arora...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I have a RDD with 30 record ( Key/value pair ) and running 30 executor
>>>> . i
>>>> want to reparation this RDD in to 30 partition so every partition  get
>>>> one
>>>> record and assigned to one executor .
>>>>
>>>> when i used rdd.repartition(30) its repartition my rdd in 30 partition
>>>> but
>>>> some partition get 2 record , some get 1 record and some not getting any
>>>> record .
>>>>
>>>> is there any way in spark so i can evenly distribute my record in all
>>>> partition .
>>>>
>>>> Regards
>>>> Prateek
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>
>


how can evenly distribute my records in all partition

2015-11-16 Thread prateek arora
Hi

I have a RDD with 30 record ( Key/value pair ) and running 30 executor . i
want to reparation this RDD in to 30 partition so every partition  get one
record and assigned to one executor .

when i used rdd.repartition(30) its repartition my rdd in 30 partition but
some partition get 2 record , some get 1 record and some not getting any
record .

is there any way in spark so i can evenly distribute my record in all
partition .

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Sprk RDD : want to combine elements that have approx same keys

2015-09-10 Thread prateek arora
Hi

In my scenario :
I have rdd with key/value pair . i want to combine elements that have approx
same keys.
like
(144,value)(143,value)(142,value)...(214,value)(213,value)(212,value)(313,value)(314,value)...

i want to combine elements that have key 144.143,142... means keys have
+-2 range
same with 214,213,212 keys and so on.

how can i do this

regards
prateek

 







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Sprk-RDD-want-to-combine-elements-that-have-approx-same-keys-tp24644.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



get java.io.FileNotFoundException when use addFile Function

2015-07-15 Thread prateek arora
I am trying to write a simple program using addFile Function but getting
error in my worker node that file doest not exist

tage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost
task 0.3 in stage 0.0 (TID 3, slave2.novalocal):
java.io.FileNotFoundException: File
file:/tmp/spark-2791415c-8c20-4920-b3cd-5a6b8b6f3f8d/userFiles-a5a98f06-2d38-48
76-8c8c-82a10ac5431f/csv_ip.csv does not exist

code are as below :

val sc = new SparkContext(sparkConf)
val inputFile =file:///home/ubuntu/test/Spark_CSV/spark_csv_job/csv_ip.csv
sc.addFile(inputFile)
val inFile = sc.textFile(file://+SparkFiles.get(csv_ip.csv))
inFile.take(10).foreach(println)

please help me resolve error. Thanks in advance.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/get-java-io-FileNotFoundException-when-use-addFile-Function-tp23867.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



get java.io.FileNotFoundException when use addFile Function

2015-07-15 Thread prateek arora
Hi

I am trying to write a simple program using addFile Function but getting
error in my worker node that file doest not exist

tage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost
task 0.3 in stage 0.0 (TID 3, slave2.novalocal):
java.io.FileNotFoundException: File
file:/tmp/spark-2791415c-8c20-4920-b3cd-5a6b8b6f3f8d/userFiles-a5a98f06-2d38-48
76-8c8c-82a10ac5431f/csv_ip.csv does not exist

code are as below :

val sc = new SparkContext(sparkConf)
val inputFile
=file:///home/ubuntu/test/Spark_CSV/spark_csv_job/csv_ip.csv
sc.addFile(inputFile)
val inFile = sc.textFile(file://+SparkFiles.get(csv_ip.csv))
inFile.take(10).foreach(println)

please help me resolve error. Thanks in advance.


Regards
prateek


Re: connector for CouchDB

2015-01-29 Thread prateek arora
I am also looking for connector for CouchDB in Spark. did you find anything ? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21422.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



spark connector for CouchDB

2015-01-29 Thread prateek arora
i am looking for the spark connector for Couch DB please help me .



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-connector-for-CouchDB-tp21421.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: connector for CouchDB

2015-01-29 Thread prateek arora
yes please but i am new  for spark and couchdb . 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21428.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: connector for CouchDB

2015-01-29 Thread prateek arora
I can also switch to the mongodb if spark have a support for the.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21429.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org