spark.executor.extraClassPath takes csv as jars the way
--jars takes it but it should be : separated like a regular classpath
jar.
Thanks for your help!
-Utkarsh
On Mon, Aug 24, 2015 at 5:05 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
I get the same error even when I set the SPARK_CLASSPATH: export
Continuing this discussion:
http://apache-spark-user-list.1001560.n3.nabble.com/same-log4j-slf4j-error-in-spark-9-1-td5592.html
I am getting this error when I use logback-classic.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
?
On Mon, Aug 24, 2015 at 2:50 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
Continuing this discussion:
http://apache-spark-user-list.1001560.n3.nabble.com/same-log4j-slf4j-error-in-spark-9-1-td5592.html
I am getting this error when I use logback-classic.
SLF4J: Class path contains
, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
I assumed that's the case beacause of the error I got and the
documentation which says: Extra classpath entries to append to the
classpath of the driver.
This is where I stand now:
dependency
groupIdorg.apache.spark/groupId
take
precedence over the log4j binding embedded in the Spark assembly.
On Mon, Aug 24, 2015 at 3:15 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
Hi Marcelo,
When I add this exclusion rule to my pom:
dependency
groupIdorg.apache.spark/groupId
)
at
com.opentable.logging.AssimilateForeignLoggingHook.automaticAssimilationHook(AssimilateForeignLoggingHook.java:28)
at com.opentable.logging.Log.clinit(Log.java:31)
... 16 more
Thanks,
-Utkarsh
On Mon, Aug 24, 2015 at 4:11 PM, Marcelo Vanzin van...@cloudera.com wrote:
On Mon, Aug 24, 2015 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.com
Hi Tim,
Any way I can provide more info on this?
On Thu, Oct 1, 2015 at 4:21 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
wrote:
> Not sure what you mean by that, I shared the data which I see in spark UI.
> Can you point me to a location where I can precisely get the data you need
I am working on code which uses executor service to parallelize tasks
(think machine learning computations done over small dataset over and over
again).
My goal is to execute some code as fast as possible, multiple times and
store the result somewhere (total executions will be on the order of 100M
So do I need to manually copy these 2 jars on my spark executors?
On Tue, Aug 25, 2015 at 10:51 AM, Marcelo Vanzin van...@cloudera.com
wrote:
On Tue, Aug 25, 2015 at 10:48 AM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
Now I am going to try it out on our mesos cluster.
I assumed
at 1:50 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
So do I need to manually copy these 2 jars on my spark executors?
Yes. I can think of a way to work around that if you're using YARN,
but not with other cluster managers.
On Tue, Aug 25, 2015 at 10:51 AM, Marcelo Vanzin van
I am trying to run this, a basic mapToPair and then count() to trigger an
action.
4 executors are launched but I don't see any relevant logs on those
executors.
It looks like the the driver is pulling all the data and it runs out of
memory, the dataset is big, so it won't fit on 1 machine.
So
> split object is, but it's plausible that the process of creating an array
> of 6.5 million of them is causing you to run out of memory.
>
> I think the reason you don't see anything in the executor logs is that the
> exception is occurring before the work is tasked to the execut
ith one node right?
>
> And when the job is running could you open the Spark webui and get stats
> about the heap size and other java settings?
>
> Tim
>
> On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
> wrote:
>
>> Bumping this one up,
th fine vs
> coarse grain mode look like?
>
> Tim
>
> On Thu, Oct 1, 2015 at 4:05 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
> wrote:
>
>> Bumping it up, its not really a blocking issue.
>> But fine grain mode eats up uncertain number of resources in mesos a
Bumping it up, its not really a blocking issue.
But fine grain mode eats up uncertain number of resources in mesos and
launches tons of tasks, so I would prefer using the coarse grained mode if
only it didn't run out of memory.
Thanks,
-Utkarsh
On Mon, Sep 28, 2015 at 2:24 PM, Utkarsh Sengar
If broadcast variable doesn't fit in memory, I think is not the right fit
for you.
You can think about fitting it with an RDD as a tuple with other data you
are working on.
Say you are working on RDD (rdd in your case), run a map/reduce
to convert it to RDD> so now
We are using "spark-1.4.1-bin-hadoop2.4" on mesos (not EMR) with s3 to read
and write data and haven't noticed any inconsistencies with it, so 1
(mostly) and 2 definitely should not be a problem.
Regarding 3, are you setting the file system impl in spark config?
I am running Spark 1.4.1 on mesos.
The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd, dRdd) of size
100, 100, 7 and 1 respectively. Lets call it prouctRDD.
Creation of "aRdd" needs data pull from multiple data sources, merging it
and creating a tuple of JavaRdd, finally aRDD looks
Bumping this one up, any suggestions on the stacktrace?
spark.mesos.coarse=true is not working and the driver crashed with the
error.
On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
wrote:
> Missed to do a reply-all.
>
> Tim,
>
> spark.mesos.coars
I am trying to load a PMML file in a spark job. Instantiate it only once
and pass it to the executors. But I get a NotSerializableException for
org.xml.sax.helpers.LocatorImpl which is used inside jpmml.
I have this class Prediction.java:
public class Prediction implements Serializable {
This SO question was asked about 1yr ago.
http://stackoverflow.com/questions/31799755/how-to-deal-with-tasks-running-too-long-comparing-to-others-in-job-in-yarn-cli
I answered this question with a suggestion to try speculation but it
doesn't quite do what the OP expects. I have been running into
laskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Jun 16, 2016 at 6:45 AM, Utkarsh Sengar <utkarsh2...@gmail.com>
> wrote:
> > This SO question was asked about 1yr ago.
> >
> http:
I am storing a model in s3 in this path:
"bucket_name/p1/models/lr/20160204_0410PM/ser" and the structure of the
saved dir looks like this:
1. bucket_name/p1/models/lr/20160204_0410PM/ser/data -> _SUCCESS,
_metadata, _common_metadata
and
I am trying to debug code executed in executors by logging. Even when I add
log4j's LOG.info(..) inside .map() I don't see it in mesos task logs in the
corresponding slaves.
Its anyway inefficient to keep checking multiple slaves for logs.
One way to deal with this is to push logs to a central
The problem turned out to be corrupt parquet data, the error was a bit
misleading by spark though.
On Mon, Feb 8, 2016 at 3:41 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
wrote:
> I am storing a model in s3 in this path:
> "bucket_name/p1/models/lr/20160204_0410PM/ser&q
Upgraded to spark2.0 and tried to load a model:
LogisticRegressionModel model = LogisticRegressionModel.load(sc.sc(),
"s3a://cruncher/c/models/lr/");
Getting this error: Exception in thread "main"
java.lang.IllegalArgumentException: Wrong FS: file://spark-warehouse,
expected: file:///
Full
I don't think its a related problem, although by setting
"spark.sql.warehouse.dir"=/tmp in spark config fixed it.
On Tue, Aug 2, 2016 at 5:02 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
wrote:
> Do we have a workaround for this problem?
> Can I overwrite that using some con
it's stalled
> and needs an update.
>
> On Tue, Aug 2, 2016 at 4:47 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
> wrote:
> > Upgraded to spark2.0 and tried to load a model:
> > LogisticRegressionModel model = LogisticRegressionModel.load(sc.sc(),
> > "s3a:
and it fixed the problem.
On Wed, Aug 3, 2016 at 10:04 AM, Utkarsh Sengar <utkarsh2...@gmail.com>
wrote:
> After an upgrade from 1.5.1 to 2.0, one of the tasks never completes and
> keeps spilling data to disk overtime.
> long count = resultRdd.count();
>
We are intermittently getting this error when spark tried to load data from
S3:Caused by: sun.security.provider.certpath.SunCertPathBuilderException:
unable to find valid certification path to requested target.
30 matches
Mail list logo