Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-25 Thread Utkarsh Sengar
spark.executor.extraClassPath takes csv as jars the way --jars takes it but it should be : separated like a regular classpath jar. Thanks for your help! -Utkarsh On Mon, Aug 24, 2015 at 5:05 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: I get the same error even when I set the SPARK_CLASSPATH: export

Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-24 Thread Utkarsh Sengar
Continuing this discussion: http://apache-spark-user-list.1001560.n3.nabble.com/same-log4j-slf4j-error-in-spark-9-1-td5592.html I am getting this error when I use logback-classic. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in

Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-24 Thread Utkarsh Sengar
? On Mon, Aug 24, 2015 at 2:50 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Continuing this discussion: http://apache-spark-user-list.1001560.n3.nabble.com/same-log4j-slf4j-error-in-spark-9-1-td5592.html I am getting this error when I use logback-classic. SLF4J: Class path contains

Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-24 Thread Utkarsh Sengar
, Utkarsh Sengar utkarsh2...@gmail.com wrote: I assumed that's the case beacause of the error I got and the documentation which says: Extra classpath entries to append to the classpath of the driver. This is where I stand now: dependency groupIdorg.apache.spark/groupId

Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-24 Thread Utkarsh Sengar
take precedence over the log4j binding embedded in the Spark assembly. On Mon, Aug 24, 2015 at 3:15 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Hi Marcelo, When I add this exclusion rule to my pom: dependency groupIdorg.apache.spark/groupId

Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-24 Thread Utkarsh Sengar
) at com.opentable.logging.AssimilateForeignLoggingHook.automaticAssimilationHook(AssimilateForeignLoggingHook.java:28) at com.opentable.logging.Log.clinit(Log.java:31) ... 16 more Thanks, -Utkarsh On Mon, Aug 24, 2015 at 4:11 PM, Marcelo Vanzin van...@cloudera.com wrote: On Mon, Aug 24, 2015 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.com

Re: spark.mesos.coarse impacts memory performance on mesos

2015-10-09 Thread Utkarsh Sengar
Hi Tim, Any way I can provide more info on this? On Thu, Oct 1, 2015 at 4:21 PM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote: > Not sure what you mean by that, I shared the data which I see in spark UI. > Can you point me to a location where I can precisely get the data you need

Porting a multit-hreaded compute intensive job to spark

2015-08-27 Thread Utkarsh Sengar
I am working on code which uses executor service to parallelize tasks (think machine learning computations done over small dataset over and over again). My goal is to execute some code as fast as possible, multiple times and store the result somewhere (total executions will be on the order of 100M

Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-25 Thread Utkarsh Sengar
So do I need to manually copy these 2 jars on my spark executors? On Tue, Aug 25, 2015 at 10:51 AM, Marcelo Vanzin van...@cloudera.com wrote: On Tue, Aug 25, 2015 at 10:48 AM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Now I am going to try it out on our mesos cluster. I assumed

Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-25 Thread Utkarsh Sengar
at 1:50 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: So do I need to manually copy these 2 jars on my spark executors? Yes. I can think of a way to work around that if you're using YARN, but not with other cluster managers. On Tue, Aug 25, 2015 at 10:51 AM, Marcelo Vanzin van

RDD transformation and action running out of memory

2015-09-12 Thread Utkarsh Sengar
I am trying to run this, a basic mapToPair and then count() to trigger an action. 4 executors are launched but I don't see any relevant logs on those executors. It looks like the the driver is pulling all the data and it runs out of memory, the dataset is big, so it won't fit on 1 machine. So

Re: RDD transformation and action running out of memory

2015-09-13 Thread Utkarsh Sengar
> split object is, but it's plausible that the process of creating an array > of 6.5 million of them is causing you to run out of memory. > > I think the reason you don't see anything in the executor logs is that the > exception is occurring before the work is tasked to the execut

Re: spark.mesos.coarse impacts memory performance on mesos

2015-09-28 Thread Utkarsh Sengar
ith one node right? > > And when the job is running could you open the Spark webui and get stats > about the heap size and other java settings? > > Tim > > On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar <utkarsh2...@gmail.com> > wrote: > >> Bumping this one up,

Re: spark.mesos.coarse impacts memory performance on mesos

2015-10-01 Thread Utkarsh Sengar
th fine vs > coarse grain mode look like? > > Tim > > On Thu, Oct 1, 2015 at 4:05 PM, Utkarsh Sengar <utkarsh2...@gmail.com> > wrote: > >> Bumping it up, its not really a blocking issue. >> But fine grain mode eats up uncertain number of resources in mesos a

Re: spark.mesos.coarse impacts memory performance on mesos

2015-10-01 Thread Utkarsh Sengar
Bumping it up, its not really a blocking issue. But fine grain mode eats up uncertain number of resources in mesos and launches tons of tasks, so I would prefer using the coarse grained mode if only it didn't run out of memory. Thanks, -Utkarsh On Mon, Sep 28, 2015 at 2:24 PM, Utkarsh Sengar

Re: How to share memory in a broadcast between tasks in the same executor?

2015-09-22 Thread Utkarsh Sengar
If broadcast variable doesn't fit in memory, I think is not the right fit for you. You can think about fitting it with an RDD as a tuple with other data you are working on. Say you are working on RDD (rdd in your case), run a map/reduce to convert it to RDD> so now

Re: How does one use s3 for checkpointing?

2015-09-21 Thread Utkarsh Sengar
We are using "spark-1.4.1-bin-hadoop2.4" on mesos (not EMR) with s3 to read and write data and haven't noticed any inconsistencies with it, so 1 (mostly) and 2 definitely should not be a problem. Regarding 3, are you setting the file system impl in spark config?

spark.mesos.coarse impacts memory performance on mesos

2015-09-21 Thread Utkarsh Sengar
I am running Spark 1.4.1 on mesos. The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd, dRdd) of size 100, 100, 7 and 1 respectively. Lets call it prouctRDD. Creation of "aRdd" needs data pull from multiple data sources, merging it and creating a tuple of JavaRdd, finally aRDD looks

Re: spark.mesos.coarse impacts memory performance on mesos

2015-09-24 Thread Utkarsh Sengar
Bumping this one up, any suggestions on the stacktrace? spark.mesos.coarse=true is not working and the driver crashed with the error. On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote: > Missed to do a reply-all. > > Tim, > > spark.mesos.coars

RegressionModelEvaluator (from jpmml) NotSerializableException when instantiated in the driver

2015-12-09 Thread Utkarsh Sengar
I am trying to load a PMML file in a spark job. Instantiate it only once and pass it to the executors. But I get a NotSerializableException for org.xml.sax.helpers.LocatorImpl which is used inside jpmml. I have this class Prediction.java: public class Prediction implements Serializable {

How to deal with tasks running too long?

2016-06-15 Thread Utkarsh Sengar
This SO question was asked about 1yr ago. http://stackoverflow.com/questions/31799755/how-to-deal-with-tasks-running-too-long-comparing-to-others-in-job-in-yarn-cli I answered this question with a suggestion to try speculation but it doesn't quite do what the OP expects. I have been running into

Re: How to deal with tasks running too long?

2016-06-16 Thread Utkarsh Sengar
laskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Thu, Jun 16, 2016 at 6:45 AM, Utkarsh Sengar <utkarsh2...@gmail.com> > wrote: > > This SO question was asked about 1yr ago. > > > http:

LogisticRegressionModel not able to load serialized model from S3

2016-02-08 Thread Utkarsh Sengar
I am storing a model in s3 in this path: "bucket_name/p1/models/lr/20160204_0410PM/ser" and the structure of the saved dir looks like this: 1. bucket_name/p1/models/lr/20160204_0410PM/ser/data -> _SUCCESS, _metadata, _common_metadata and

Using accumulator to push custom logs to driver

2016-02-01 Thread Utkarsh Sengar
I am trying to debug code executed in executors by logging. Even when I add log4j's LOG.info(..) inside .map() I don't see it in mesos task logs in the corresponding slaves. Its anyway inefficient to keep checking multiple slaves for logs. One way to deal with this is to push logs to a central

Re: LogisticRegressionModel not able to load serialized model from S3

2016-02-11 Thread Utkarsh Sengar
The problem turned out to be corrupt parquet data, the error was a bit misleading by spark though. On Mon, Feb 8, 2016 at 3:41 PM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote: > I am storing a model in s3 in this path: > "bucket_name/p1/models/lr/20160204_0410PM/ser&q

Spark 2.0 error: Wrong FS: file://spark-warehouse, expected: file:///

2016-08-02 Thread Utkarsh Sengar
Upgraded to spark2.0 and tried to load a model: LogisticRegressionModel model = LogisticRegressionModel.load(sc.sc(), "s3a://cruncher/c/models/lr/"); Getting this error: Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://spark-warehouse, expected: file:/// Full

Re: Spark 2.0 error: Wrong FS: file://spark-warehouse, expected: file:///

2016-08-02 Thread Utkarsh Sengar
I don't think its a related problem, although by setting "spark.sql.warehouse.dir"=/tmp in spark config fixed it. On Tue, Aug 2, 2016 at 5:02 PM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote: > Do we have a workaround for this problem? > Can I overwrite that using some con

Re: Spark 2.0 error: Wrong FS: file://spark-warehouse, expected: file:///

2016-08-02 Thread Utkarsh Sengar
it's stalled > and needs an update. > > On Tue, Aug 2, 2016 at 4:47 PM, Utkarsh Sengar <utkarsh2...@gmail.com> > wrote: > > Upgraded to spark2.0 and tried to load a model: > > LogisticRegressionModel model = LogisticRegressionModel.load(sc.sc(), > > "s3a:

Re: Spark 2.0: Task never completes

2016-08-03 Thread Utkarsh Sengar
and it fixed the problem. On Wed, Aug 3, 2016 at 10:04 AM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote: > After an upgrade from 1.5.1 to 2.0, one of the tasks never completes and > keeps spilling data to disk overtime. > long count = resultRdd.count(); >

javax.net.ssl.SSLHandshakeException: unable to find valid certification path to requested target

2016-06-20 Thread Utkarsh Sengar
We are intermittently getting this error when spark tried to load data from S3:Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.