As per the recent presentation given in Scala days (
http://people.apache.org/~marmbrus/talks/SparkSQLScalaDays2014.pdf), it was
mentioned that Catalyst is independent of Spark. But on inspecting pom.xml
of sql/catalyst module, it seems it has a dependency on Spark Core. Any
particular reason for
I am trying to serialize objects contained in RDDs using runtime relfection
via TypeTag. However, the Spark job keeps
failing java.io.NotSerializableException on an instance of TypeCreator
(auto generated by compiler to enable TypeTags). Is there any workaround
for this without switching to scala
Sometimes it is useful to convert a RDD into a DStream for testing purposes
(generating DStreams from historical data, etc). Is there an easy way to do
this?
I could come up with the following inefficient way but no sure if there is
a better way to achieve this. Thoughts?
class
August 2014 13:55, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote:
Sometimes it is useful to convert a RDD into a DStream for testing
purposes (generating DStreams from historical data, etc). Is there an easy
way to do this?
I could come up with the following inefficient way but no sure
*:7077 $SPARK_HOME/bin/spark-shell
Then it will connect to that *whatever* master.
Thanks
Best Regards
On Tue, Aug 5, 2014 at 8:51 PM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
Hi
Apologies if this is a noob question. I have setup Spark 1.0.1 on EMR
using a slightly modified
Hi everyone
I back ported kinesis-asl to spark 1.0.2 and ran a quick test on my local
machine. It seems to be working fine but I keep getting the following
warnings. I am not sure what it means and weather it is something to worry
about or not.
2014-08-22 15:53:43,803 [pool-1-thread-7] WARN
Hi everyone
Sorry about the noob question, but I am struggling to understand ways to
create DStreams in Spark. Here is my understanding based on what I could
gather from documentation and studying Spark code (as well as some hunch).
Please correct me if I am wrong.
1. In most cases, one would
.
Thanks,
Aniket
On 22 August 2014 15:54, Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
Hi everyone
I back ported kinesis-asl to spark 1.0.2 and ran a quick test on my local
machine. It seems to be working fine but I keep getting the following
warnings. I am not sure what it means and weather
I am building (yet another) job server for Spark using Play! framework and
it seems like Play's akka dependency conflicts with Spark's shaded akka
dependency. Using SBT, I can force Play to use akka 2.2.3 (unshaded) but I
haven't been able to figure out how to exclude com.typesafe.akka
On my local (windows) dev environment, I have been trying to get spark
streaming running to test my real time(ish) jobs. I have set the checkpoint
directory as /tmp/spark and have installed latest cygwin. I keep getting
the following error:
org.apache.hadoop.util.Shell$ExitCodeException: chmod:
Hi everyone
It turns out that I had chef installed and it's chmod has higher
preference than cygwin's chmod in the PATH. I fixed the environment
variable and now its working fine.
On 1 September 2014 11:48, Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
On my local (windows) dev
Hi all
I am struggling to implement a use case wherein I need to trigger an action
in case no data has been received for X amount of time. I haven't been able
to figure out an easy way to do this. No state/foreach methods get called
when no data has arrived. I thought of generating a 'tick'
Sorry about the noob question, but I was just wondering if we use Spark's
ActorSystem (SparkEnv.actorSystem), would it distribute actors across
worker nodes or would the actors only run in driver JVM?
Hi all
I am trying to run kinesis spark streaming application on a standalone
spark cluster. The job works find in local mode but when I submit it (using
spark-submit), it doesn't do anything. I enabled logs
for org.apache.spark.streaming.kinesis package and I regularly get the
following in
I am running a simple Spark Streaming program that pulls in data from
Kinesis at a batch interval of 10 seconds, windows it for 10 seconds, maps
data and persists to a store.
The program is running in local mode right now and runs out of memory after
a while. I am yet to investigate heap dumps
:
You could set spark.executor.memory to something bigger than the
default (512mb)
On Thu, Sep 11, 2014 at 8:31 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
I am running a simple Spark Streaming program that pulls in data from
Kinesis at a batch interval of 10 seconds, windows
Just curiois... What's the use case you are looking to implement?
On Sep 11, 2014 10:50 PM, Daniil Osipov daniil.osi...@shazam.com wrote:
Limited memory could also cause you some problems and limit usability. If
you're looking for a local testing environment, vagrant boxes may serve you
much
Dependency hell... My fav problem :).
I had run into a similar issue with hbase and jetty. I cant remember thw
exact fix, but is are excerpts from my dependencies that may be relevant:
val hadoop2Common = org.apache.hadoop % hadoop-common % hadoop2Version
excludeAll(
appreciated
reinis
--
-Original-Nachricht-
Von: Aniket Bhatnagar aniket.bhatna...@gmail.com
An: sp...@orbit-x.de
Cc: user user@spark.apache.org
Datum: 11-09-2014 20:00
Betreff: Re: Re[2]: HBase 0.96+ with Spark 1.0+
Dependency hell... My fav problem :).
I had
...@gmail.com
wrote:
Which version of spark are you running?
If you are running the latest one, then could try running not a window but
a simple event count on every 2 second batch, and see if you are still
running out of memory?
TD
On Thu, Sep 11, 2014 at 10:34 AM, Aniket Bhatnagar
Kinesis integration.
TD
On Thu, Sep 11, 2014 at 4:51 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
Hi all
I am trying to run kinesis spark streaming application on a standalone
spark cluster. The job works find in local mode but when I submit it (using
spark-submit), it doesn't do
foreachPartition() + Put(), but your example code can be
used to clean up my code.
BTW, since the data uploaded by Put() goes through normal HBase write
path, it can be slow.
So, it would be nice if bulk-load could be used, since it bypasses the
write path.
Thanks.
*From:* Aniket Bhatnagar
on the cluster?
Thanks,
Aniket
On 22 September 2014 18:14, Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
Hi all
I was finally able to figure out why this streaming appeared stuck. The
reason was that I was running out of workers in my standalone deployment of
Spark. There was no feedback
Hi all
Reading through Spark streaming's custom receiver documentation, it is
recommended that onStart and onStop methods should not block indefinitely.
However, looking at the source code of KinesisReceiver, the onStart method
calls worker.run that blocks until worker is shutdown (via a call to
Hi all
I have written a job that reads data from HBASE and writes to HDFS (fairly
simple). While running the job, I noticed that a few of the tasks failed
with the following error. Quick googling on the error suggests that its an
unexplained error and is perhaps intermittent. What I am curious to
Just curious... Why would you not store the processed results in regular
relational database? Not sure what you meant by persist the appropriate
RDDs. Did you mean output of your job will be RDDs?
On 24 October 2014 13:35, ankits ankitso...@gmail.com wrote:
I want to set up spark SQL to allow
.
thanks for posting this, aniket!
-chris
On Fri, Sep 12, 2014 at 5:34 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
Hi all
Sorry but this was totally my mistake. In my persistence logic, I was
creating async http client instance in RDD foreach but was never closing it
leading
I am writing yet another Spark job server and have been able to submit jobs
and return/save results. I let multiple jobs use the same spark context but
I set job group while firing each job so that I can in future cancel jobs.
Further, what I deserve to do is provide some kind of status
I am trying to figure out if sorting is persisted after applying Pair RDD
transformations and I am not able to decisively tell after reading the
documentation.
For example:
val numbers = .. // RDD of numbers
val pairedNumbers = numbers.map(number = (number % 100, number))
val sortedPairedNumbers
petrella andy.petre...@gmail.com
wrote:
I started some quick hack for that in the notebook, you can head to:
https://github.com/andypetrella/spark-notebook/
blob/master/common/src/main/scala/notebook/front/widgets/SparkInfo.scala
On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar
aniket.bhatna
ak...@sigmoidanalytics.com
wrote:
If something is persisted you can easily see them under the Storage tab
in the web ui.
Thanks
Best Regards
On Tue, Nov 18, 2014 at 7:26 PM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
I am trying to figure out if sorting is persisted after
/ apache/spark/Aggregator.scala is that your function
will always see the items in the order that they are in the input RDD. An
RDD partition is always accessed as an iterator, so it will not be read out
of order.
On Wed, Nov 19, 2014 at 2:28 PM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote
to add this stuffs in the public API.
Any other ideas?
On Tue Nov 18 2014 at 4:03:35 PM Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
Thanks Andy. This is very useful. This gives me all active stages their
percentage completion but I am unable to tie stages to job group (or
specific job
.,
https://github.com/apache/spark/pull/3009
If there is anything that needs to be added, please add it to those issues
or PRs.
On Wed, Nov 19, 2014 at 7:55 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
I have for now submitted a JIRA ticket @
https://issues.apache.org/jira
What's your cluster size? For streamig to work, it needs shards + 1
executors.
On Wed, Nov 26, 2014, 5:53 PM A.K.M. Ashrafuzzaman
ashrafuzzaman...@gmail.com wrote:
Hi guys,
When we are using Kinesis with 1 shard then it works fine. But when we use
more that 1 then it falls into an infinite
-175-5592433
Twitter https://twitter.com/ashrafuzzaman | Blog
http://jitu-blog.blogspot.com/ | Facebook
https://www.facebook.com/ashrafuzzaman.jitu
Check out The Academy http://newscred.com/theacademy, your #1 source
for free content marketing resources
On Wed, Nov 26, 2014 at 11:26 PM, Aniket
I am trying to create (yet another) spark as a service tool that lets you
submit jobs via REST APIs. I think I have nearly gotten it to work baring a
few issues. Some of which seem already fixed in 1.2.0 (like SPARK-2889) but
I have hit the road block with the following issue.
I have created a
inside target/scala-*/projectname-*.jar) and then
use it while submitting. If you are not using spark-submit then you can
simply add this jar to spark by
sc.addJar(/path/to/target/scala*/projectname*jar)
Thanks
Best Regards
On Mon, Dec 8, 2014 at 7:23 PM, Aniket Bhatnagar
aniket.bhatna
I am running spark 1.1.0 on AWS EMR and I am running a batch job that
should seems to be highly parallelizable in yarn-client mode. But spark
stop spawning any more executors after spawning 6 executors even though
YARN cluster has 15 healthy m1.large nodes. I even tried providing
'--num-executors
The reason is because of the following code:
val numStreams = numShards
val kinesisStreams = (0 until numStreams).map { i =
KinesisUtils.createStream(ssc, streamName, endpointUrl,
kinesisCheckpointInterval,
InitialPositionInStream.LATEST, StorageLevel.MEMORY_AND_DISK_2)
}
In the above
Try the workaround (addClassPathJars(sparkContext,
this.getClass.getClassLoader) discussed in
http://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAJOb8buD1B6tUtOfG8_Ok7F95C3=r-zqgffoqsqbjdxd427...@mail.gmail.com%3E
Thanks,
Aniket
On Mon Dec 15 2014 at 07:43:24 Tomoya Igarashi
The reason not using sc.newAPIHadoopRDD is it only support one scan each
time.
I am not sure is that's true. You can use multiple scans as following:
val scanStrings = scans.map(scan = convertScanToString(scan))
conf.setStrings(MultiTableInputFormat.SCANS, scanStrings : _*)
where
In case you are still looking for help, there has been multiple discussions
in this mailing list that you can try searching for. Or you can simply use
https://github.com/unicredit/hbase-rdd :-)
Thanks,
Aniket
On Wed Dec 03 2014 at 16:11:47 Ted Yu yuzhih...@gmail.com wrote:
Which hbase release
/TomoyaIgarashi/9688bdd5663af95ddd4d
Is there any problem?
2014-12-15 18:48 GMT+09:00 Aniket Bhatnagar aniket.bhatna...@gmail.com:
Try the workaround (addClassPathJars(sparkContext,
this.getClass.getClassLoader) discussed in
http://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox
It turns out that this happens when checkpoint is set to a local directory
path. I have opened a JIRA SPARK-4862 for Spark streaming to output better
error message.
Thanks,
Aniket
On Tue Dec 16 2014 at 20:08:13 Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
I am using spark 1.1.0 running
Hi guys
I am hoping someone might have a clue on why this is happening. Otherwise I
will have to dwell into YARN module's source code to better understand the
issue.
On Wed, Dec 10, 2014, 11:54 PM Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
I am running spark 1.1.0 on AWS EMR and I am
I would think that it has to be per worker.
On Wed, Dec 17, 2014, 6:32 PM Ashic Mahtab as...@live.com wrote:
Hello,
Say, I have the following code:
let something = Something()
someRdd.foreachRdd(something.someMethod)
And in something, I have a lazy member variable that gets created in
Hi all
I just realized that spark-yarn artifact hasn't been published for 1.2.0
release. Any particular reason for that? I was using it in my yet another
spark-job-server project to submit jobs to a YARN cluster through
convenient REST APIs (with some success). The job server was creating
/JW1q5vd61V1/Spark-yarn+1.2.0subj=Re+spark+yarn_2+10+1+2+0+artifacts
Cheers
On Dec 28, 2014, at 11:13 PM, Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
Hi all
I just realized that spark-yarn artifact hasn't been published for 1.2.0
release. Any particular reason for that? I was using
Hi Josh
Is there documentation available for status API? I would like to use it.
Thanks,
Aniket
On Sun Dec 28 2014 at 02:37:32 Josh Rosen rosenvi...@gmail.com wrote:
The console progress bars are implemented on top of a new stable status
API that was added in Spark 1.2. It's possible to
little to no risk to introduce new bugs elsewhere in Spark.
On Mon, Dec 29, 2014 at 3:08 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
Hi Josh
Is there documentation available for status API? I would like to use it.
Thanks,
Aniket
On Sun Dec 28 2014 at 02:37:32 Josh Rosen
Did you check firewall rules in security groups?
On Tue, Dec 30, 2014, 9:34 PM Laeeq Ahmed laeeqsp...@yahoo.com.invalid
wrote:
Hi,
I am using spark standalone on EC2. I can access ephemeral hdfs from
spark-shell interface but I can't access hdfs in standalone application. I
am using spark
I am trying to read a file into a single partition but it seems like
sparkContext.textFile ignores the passed minPartitions value. I know I can
repartition the RDD but I was curious to know if this is expected or if
this is a bug that needs to be further investigated?
number of partitions value is to increase number of partitions
not reduce it from default value.
On Thu, Jan 1, 2015 at 10:43 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
I am trying to read a file into a single partition but it seems like
sparkContext.textFile ignores the passed
On Fri Jan 30 2015 at 23:29:25 Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
Right. Which makes me to believe that the directory is perhaps configured
somewhere and i have missed configuring the same. The process that is
submitting jobs (basically becomes driver) is running in sudo mode
When you repartiton, ordering can get lost. You would need to sort after
repartitioning.
Aniket
On Tue, Jan 20, 2015, 7:08 AM anny9699 anny9...@gmail.com wrote:
Hi,
I am using Spark on AWS and want to write the output to S3. It is a
relatively small file and I don't want them to output as
Sorry. I couldn't understand the issue. Are you trying to send data to
kinesis from a spark batch/real time job?
- Aniket
On Fri, Jan 16, 2015, 9:40 PM Hafiz Mujadid hafizmujadi...@gmail.com
wrote:
Hi Experts!
I am using kinesis dependency as follow
groupId = org.apache.spark
artifactId =
Are you using spark in standalone mode or yarn or mesos? If its yarn,
please mention the hadoop distribution and version. What spark distribution
are you using (it seems 1.2.0 but compiled with which hadoop version)?
Thanks, Aniket
On Thu, Jan 15, 2015, 4:59 PM Hafiz Mujadid
(ThreadPoolExecutor.java:615)
[na:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
On Wed Jan 21 2015 at 17:34:34 Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
While implementing a spark server, I realized that Thread's context loader
must be set to any dynamically loaded
While implementing a spark server, I realized that Thread's context loader
must be set to any dynamically loaded classloader so that ClosureCleaner
can do it's thing. Should the ClosureCleaner not use classloader created by
SparkContext (that has all dynamically added jars via SparkContext.addJar)
Sure there is. Create a new RDD just containing the schema line (hint: use
sc.parallelize) and then union both the RDDs (the header RDD and data RDD)
to get a final desired RDD.
On Thu Jan 15 2015 at 19:48:52 Hafiz Mujadid hafizmujadi...@gmail.com
wrote:
hi experts!
I hav an RDD[String] and i
This might be helpful:
http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job
On Tue Jan 27 2015 at 07:45:18 Sharon Rapoport sha...@plaid.com wrote:
Hi,
I have an rdd of [k,v] pairs. I want to save each [v] to a file named [k].
I got them by
for it.
-Sven
On Fri, Jan 30, 2015 at 6:44 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
I am programmatically submit spark jobs in yarn-client mode on EMR.
Whenever a job tries to save file to s3, it gives the below mentioned
exception. I think the issue might be what EMR
the
Spark assembly first in the classpath fixes the issue. I expect that the
version of Parquet that's being included in the EMR libs just needs to be
upgraded.
~ Jonathan Kelly
From: Aniket Bhatnagar aniket.bhatna...@gmail.com
Date: Sunday, January 4, 2015 at 10:51 PM
To: Adam Gilmore
It seems that spark-network-yarn compiled for scala 2.11 depends on
spark-network-shuffle compiled for scala 2.10. This causes cross version
dependencies conflicts in sbt. Seems like a publishing error?
http://www.uploady.com/#!/download/6Yn95UZA0DR/3taAJFjCJjrsSXOR
This is tricky to debug. Check logs of node and resource manager of YARN to
see if you can trace the error. In the past I have to closely look at
arguments getting passed to YARN container (they get logged before
attempting to launch containers). If I still don't get a clue, I had to
check the
Go through spark API documentation. Basically you have to do group by
(date, message_type) and then do a count.
On Sun, Jan 4, 2015, 9:58 PM Dinesh Vallabhdas dines...@yahoo.com.invalid
wrote:
A spark cassandra newbie question. Thanks in advance for the help.
I have a cassandra table with 2
libraries are java-only (the scala version appended there is just for
helping the build scripts).
But it does look weird, so it would be nice to fix it.
On Wed, Jan 7, 2015 at 12:25 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
It seems that spark-network-yarn compiled for scala 2.11
Can you confirm your emr version? Could it be because of the classpath
entries for emrfs? You might face issues with using S3 without them.
Thanks,
Aniket
On Mon, Jan 5, 2015, 11:16 AM Adam Gilmore dragoncu...@gmail.com wrote:
Just an update on this - I found that the script by Amazon was the
I have a job that sorts data and runs a combineByKey operation and it
sometimes fails with the following error. The job is running on spark 1.2.0
cluster with yarn-client deployment mode. Any clues on how to debug the
error?
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
I am aggregating a dataset using combineByKey method and for a certain
input size, the job fails with the following error. I have enabled head
dumps to better analyze the issue and will report back if I have any
findings. Meanwhile, if you guys have any idea of what could possibly
result in this
the hashCode and equals method correctly.
On Wednesday, April 15, 2015 at 3:10 PM, Aniket Bhatnagar wrote:
I am aggregating a dataset using combineByKey method and for a certain
input size, the job fails with the following error. I have enabled head
dumps to better analyze the issue
and test it out?
On Thu, May 21, 2015 at 1:43 AM, Tathagata Das t...@databricks.com
wrote:
Thanks for the JIRA. I will look into this issue.
TD
On Thu, May 21, 2015 at 1:31 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
I ran into one of the issues that are potentially caused because
Are you sure RedisClientPool is being initialized properly in the
constructor of RedisCache? Can you please copy paste the code that you use
to initialize RedisClientPool inside the constructor of RedisCache?
Thanks,
Aniket
On Fri, Oct 23, 2015 at 11:47 AM Bin Wang wrote:
>
Can you try storing the state (word count) in an external key value store?
On Sat, Nov 7, 2015, 8:40 AM Thúy Hằng Lê wrote:
> Hi all,
>
> Anyone could help me on this. It's a bit urgent for me on this.
> I'm very confused and curious about Spark data checkpoint
}
> }
>
> Without using updateStageByKey, I'm only have the stats of the last
> micro-batch.
>
> Any advise on this?
>
>
> 2015-11-07 11:35 GMT+07:00 Aniket Bhatnagar <aniket.bhatna...@gmail.com>:
>
>> Can you try storing the state (word count) in an exter
rs to be the same. I was hoping an example might
> shed some light on the issue.
>
> Regards,
>
> Bryan Jeffrey
>
>
>
>
>
>
>
> On Thu, Oct 8, 2015 at 7:04 AM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>> Here is an example:
Is it possible for you to use EMR instead of EC2? If so, you may be able to
tweak EMR bootstrap scripts to install your custom spark build.
Thanks,
Aniket
On Thu, Oct 8, 2015 at 5:58 PM Theodore Vasiloudis <
theodoros.vasilou...@gmail.com> wrote:
> Hello,
>
> I was wondering if there is an easy
Can you try changing classOf[OutputFormat[String,
BoxedUnit]] to classOf[OutputFormat[String,
Put]] while configuring hconf?
On Sat, Oct 17, 2015, 11:44 AM Amit Hora wrote:
> Hi,
>
> Regresta for delayed resoonse
> please find below full stack trace
>
>
Hi Tom
There has to be a difference in classpaths in yarn-client and yarn-cluster
mode. Perhaps a good starting point would be to print classpath as a first
thing in SimpleApp.main. It should give clues around why it works in
yarn-cluster mode.
Thanks,
Aniket
On Wed, Sep 9, 2015, 2:11 PM Tom
You can perhaps setup a WAL that logs to S3? New cluster should pick the
records that weren't processed due previous cluster termination.
Thanks,
Aniket
On Thu, Sep 17, 2015, 9:19 PM Alan Dipert wrote:
> Hello,
> We are using Spark Streaming 1.4.1 in AWS EMR to process records
Hi Rachna
Can you just use http client provided via spark transitive dependencies
instead of excluding them?
The reason user classpath first is failing could be because you have spark
artifacts in your assembly jar that dont match with what is deployed
(version mismatch or you built the version
reduceByKey(_ + _)
>
> val output = wordCounts.
>map({case ((user, word), count) => (user, (word, count))}).
>groupByKey()
>
> By Aniket, if we group by user first, it could run out of memory when
> spark tries to put all words in a single sequence, couldn't it?
>
way to do so?
>
> best,
> /Shahab
>
> On Fri, Sep 18, 2015 at 12:54 PM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>> Can you try yarn-client mode?
>>
>> On Fri, Sep 18, 2015, 3:38 PM shahab <shahab.mok...@gmail.com> wrote:
>>
>>
Can you try yarn-client mode?
On Fri, Sep 18, 2015, 3:38 PM shahab wrote:
> Hi,
>
> Probably I have wrong zeppelin configuration, because I get the following
> error when I execute spark statements in Zeppelin:
>
> org.apache.spark.SparkException: Detected yarn-cluster
Hi Eval
Can you check if your Ubuntu VM has enough RAM allocated to run JVM of size
3gb?
thanks,
Aniket
On Sat, Sep 19, 2015, 9:09 PM Eyal Altshuler
wrote:
> Hi,
>
> I had configured the MAVEN_OPTS environment variable the same as you wrote.
> My java version is
Using scala API, you can first group by user and then use combineByKey.
Thanks,
Aniket
On Sat, Sep 19, 2015, 6:41 PM kali.tumm...@gmail.com
wrote:
> Hi All,
> I would like to achieve this below output using spark , I managed to write
> in Hive and call it in spark but
Are you running on YARN or standalone?
On Thu, Dec 31, 2015, 3:35 PM LinChen wrote:
> *Screenshot1(Normal WebUI)*
>
>
>
> *Screenshot2(Corrupted WebUI)*
>
>
>
> As screenshot2 shows, the format of my Spark WebUI looks strange and I
> cannot click the description of active
You can't assume that the number to nodes will be constant as some may
fail, hence you can't guarantee that a function will execute at most once
or atleast once on a node. Can you explain your use case in a bit more
detail?
On Mon, Jul 18, 2016, 10:57 PM joshuata wrote:
>
ould allow us to read the data in situ without a copy.
>
> I understand that manually assigning tasks to nodes reduces fault
> tolerance, but the simulation codes already explicitly assign tasks, so a
> failure of any one node is already a full-job failure.
>
> On Mon, J
In order to know what's going on, you can study the thread dumps either
from spark UI or from any other thread dump analysis tool.
Thanks,
Aniket
On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson
wrote:
> I'm doing some processing and then clustering of a small
Hello
Dynamic allocation feature allows you to add executors and scale
computation power. This is great, however, I feel like we also need a way
to dynamically scale storage. Currently, if the disk is not able to hold
the spilled/shuffle data, the job is aborted causing frustration and loss
of
On Sunday, November 6, 2016 8:36 AM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>
> In order to know what's going on, you can study the thread dumps either
> from spark UI or from any other thread dump analysis tool.
>
> Thanks,
> Aniket
If people agree that is desired, I am willing to submit a SIP for this and
find time to work on it.
Thanks,
Aniket
On Sun, Nov 6, 2016 at 1:06 PM Aniket Bhatnagar <aniket.bhatna...@gmail.com>
wrote:
> Hello
>
> Dynamic allocation feature allows you to add executors and scale
>
t; fails because the number of partitions is less? Or you want to do it for a
> perf gain?
>
>
>
> Also, what were your initial Dataset partitions and how many did you have
> for the result of join?
>
>
>
> *From:* Aniket Bhatnagar [mailto:aniket.bhatna...@gmail.com]
> *Sent
f data to be on one executor node.
Sent from my Windows 10 phone
*From: *Rodrick Brown <rodr...@orchardplatform.com>
*Sent: *Friday, November 25, 2016 12:25 AM
*To: *Aniket Bhatnagar <aniket.bhatna...@gmail.com>
*Cc: *user <user@spark.apache.org>
*Subject: *Re: OS killing Executor due to
Try changing compression to bzip2 or lzo. For reference -
http://comphadoop.weebly.com
Thanks,
Aniket
On Mon, Nov 21, 2016, 10:18 PM yeshwanth kumar
wrote:
> Hi,
>
> we are running Hive on Spark, we have an external table over snappy
> compressed csv file of size 917.4 M
Hi Spark users
I am running a job that does join of a huge dataset (7 TB+) and the
executors keep crashing randomly, eventually causing the job to crash.
There are no out of memory exceptions in the log and looking at the dmesg
output, it seems like the OS killed the JVM because of high memory
> happens again I will try grabbing thread dumps and I will see if I can
> figure out what is going on.
>
>
> On Sunday, November 6, 2016 10:02 AM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>
> I doubt it's GC as you mentioned that the pause is several mi
Also, how are you launching the application? Through spark submit or
creating spark content in your app?
Thanks,
Aniket
On Wed, Nov 16, 2016 at 10:44 AM Aniket Bhatnagar <
aniket.bhatna...@gmail.com> wrote:
> Thanks for sharing the thread dump. I had a look at them and couldn't find
&
1 - 100 of 104 matches
Mail list logo