Hi there,
I have been getting a strange error in spark-1.6.1
The job submitted uses only the executor launched on the Master node while
the other workers are idle.
When I check the errors from the web ui to investigate on the killed
executors I see the error:
Error: invalid log directory
Hi Mike,
I am forwarding you a mail I sent a while ago regarding some related work I
did, hope you find it useful
Hi all,
I recently sent to the dev mailing list about this contribution, but I
thought it might be useful to post it here, since I have seen a lot of
people asking about OS-level
Hi Matt,
there is some related work I recently did in IBM Research for visualizing
the metrics produced.
You can read about it here
http://www.spark.tc/sparkoscope-enabling-spark-optimization-through-cross-stack-monitoring-and-visualization-2/
We recently opensourced it if you are interested to
cation, or does
> it have to be pre-deployed on all Executor nodes?
>
> On Wed, Feb 3, 2016 at 10:36 AM, Yiannis Gkoufas <johngou...@gmail.com>
> wrote:
>
>> Hi Matt,
>>
>> there is some related work I recently did in IBM Research for visualizing
>>
Hi all,
I recently sent to the dev mailing list about this contribution, but I
thought it might be useful to post it here, since I have seen a lot of
people asking about OS-level metrics of Spark. This is the result of the
work we have been doing recently in IBM Research around Spark.
Hi,
I am exploring a bit the dynamic resource allocation provided by the
Standalone Cluster Mode and I was wondering whether this behavior I am
experiencing is expected.
In my configuration I have 3 slaves with 24 cores each.
I have in my spark-defaults.conf:
spark.shuffle.service.enabled true
Hi there,
I have been using Spark 1.5.2 on my cluster without a problem and wanted to
try Spark 1.6.0.
I have the exact same configuration on both clusters.
I am able to start the Standalone Cluster but I fail to submit a job
getting errors like the following:
16/01/05 14:24:14 INFO
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Tue, Jan 5, 2016 at 9:01 AM, Yiannis Gkoufas <johngou...@gmail.com>
> wrote:
>
>> Hi Dean,
>>
>> thanks so much for the response! It works
glotprogramming.com
>
> On Tue, Jan 5, 2016 at 8:29 AM, Yiannis Gkoufas <johngou...@gmail.com>
> wrote:
>
>> Hi there,
>>
>> I have been using Spark 1.5.2 on my cluster without a problem and wanted
>> to try Spark 1.6.0.
>> I have the exact same configuratio
Hi there,
I have my data stored in HDFS partitioned by month in Parquet format.
The directory looks like this:
-month=201411
-month=201412
-month=201501
-
I want to compute some aggregates for every timestamp.
How is it possible to achieve that by taking advantage of the existing
distinct keys, collect them on driver and have a loop over they
keys and filter out new RDD out of the original one by that key.
for( key : keys ) {
RDD.filter( key ).saveAsTextfile()
}
It might help to cache original rdd.
On 16 Jul 2015, at 12:21, Yiannis Gkoufas johngou...@gmail.com
, but it might happen that one partition might
contain more, than one key (with values). This I’m not sure, but that
shouldn’t be a big deal as you would iterate over tuplekey,
Iterablevalue and store one key to a specific file.
On 15 Jul 2015, at 03:23, Yiannis Gkoufas johngou...@gmail.com wrote
Hi there,
I have been using the approach described here:
http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job
In addition to that, I was wondering if there is a way to set the customize
the order of those values contained in each file.
Thanks a lot!
Hi there,
I would recommend checking out
https://github.com/spark-jobserver/spark-jobserver which I think gives the
functionality you are looking for.
I haven't tested it though.
BR
On 5 June 2015 at 01:35, Olivier Girardot ssab...@gmail.com wrote:
You can use it as a broadcast variable, but
(default: total memory minus 1 GB); note that each
application's individual memory is configured using its
spark.executor.memory property.
On Fri, Mar 20, 2015 at 9:25 AM, Yiannis Gkoufas johngou...@gmail.com
wrote:
Actually I realized that the correct way is:
sqlContext.sql(set
Hi all,
I have 6 nodes in the cluster and one of the nodes is clearly
under-performing:
I was wandering what is the impact of having such issues? Also what is the
recommended way to workaround it?
Thanks a lot,
Yiannis
the same error. :(
Thanks a lot!
On 19 March 2015 at 17:40, Yiannis Gkoufas johngou...@gmail.com wrote:
Hi Yin,
thanks a lot for that! Will give it a shot and let you know.
On 19 March 2015 at 16:30, Yin Huai yh...@databricks.com wrote:
Was the OOM thrown during the execution of first stage
Actually I realized that the correct way is:
sqlContext.sql(set spark.sql.shuffle.partitions=1000)
but I am still experiencing the same behavior/error.
On 20 March 2015 at 16:04, Yiannis Gkoufas johngou...@gmail.com wrote:
Hi Yin,
the way I set the configuration is:
val sqlContext = new
it until the OOM disappears. Hopefully this
will help.
Thanks,
Yin
On Wed, Mar 18, 2015 at 4:46 PM, Yiannis Gkoufas johngou...@gmail.com
wrote:
Hi Yin,
Thanks for your feedback. I have 1700 parquet files, sized 100MB each.
The number of tasks launched is equal to the number of parquet
Hi there,
I was trying the new DataFrame API with some basic operations on a parquet
dataset.
I have 7 nodes of 12 cores and 8GB RAM allocated to each worker in a
standalone cluster mode.
The code is the following:
val people = sqlContext.parquetFile(/data.parquet);
val res =
/latest/configuration.html
Cheng
On 3/18/15 9:15 PM, Yiannis Gkoufas wrote:
Hi there,
I was trying the new DataFrame API with some basic operations on a
parquet dataset.
I have 7 nodes of 12 cores and 8GB RAM allocated to each worker in a
standalone cluster mode.
The code
in the
second stage. Can you take a look at the numbers of tasks launched in these
two stages?
Thanks,
Yin
On Wed, Mar 18, 2015 at 11:42 AM, Yiannis Gkoufas johngou...@gmail.com
wrote:
Hi there, I set the executor memory to 8g but it didn't help
On 18 March 2015 at 13:59, Cheng Lian lian.cs
Hi all,
I have downloaded version 1.3.0-rc1 from
https://github.com/apache/spark/archive/v1.3.0-rc1.zip, extracted it and
built it using:
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.0 -DskipTests clean package
It doesn't complain for any issues, but when I call sbin/start-all.sh I get
on logs:
, Yiannis Gkoufas johngou...@gmail.com wrote:
Sorry for the mistake, I actually have it this way:
val myObject = new MyObject();
val myObjectBroadcasted = sc.broadcast(myObject);
val rdd1 = sc.textFile(/file1).map(e =
{
myObjectBroadcasted.value.insert(e._1);
(e._1,1)
});
rdd.cache.count
directly which won't work because you
are modifying the serialized copy on the executor. You want to do
myObjectBroadcasted.value.insert and myObjectBroadcasted.value.lookup.
Sent with Good (www.good.com)
-Original Message-
*From: *Yiannis Gkoufas [johngou...@gmail.com]
*Sent
Hi all,
I am trying to do the following.
val myObject = new MyObject();
val myObjectBroadcasted = sc.broadcast(myObject);
val rdd1 = sc.textFile(/file1).map(e =
{
myObject.insert(e._1);
(e._1,1)
});
rdd.cache.count(); //to make sure it is transformed.
val rdd2 = sc.textFile(/file2).map(e =
{
Hi there,
I assume you are using spark 1.2.1 right?
I faced the exact same issue and switched to 1.1.1 with the same
configuration and it was solved.
On 24 Feb 2015 19:22, Ted Yu yuzhih...@gmail.com wrote:
Here is a tool which may give you some clue:
http://file-leak-detector.kohsuke.org/
if there's a bug report for this regression? For some
other (possibly connected) reason I upgraded from 1.1.1 to 1.2.1, but I
can't remember what the bug was.
Joe
On 24 February 2015 at 19:26, Yiannis Gkoufas johngou...@gmail.com
wrote:
Hi there,
I assume you are using spark 1.2.1 right?
I
Hi,
I have experienced the same behavior. You are talking about standalone
cluster mode right?
BR
On 21 February 2015 at 14:37, Deep Pradhan pradhandeep1...@gmail.com
wrote:
Hi,
I have been running some jobs in my local single node stand alone cluster.
I am varying the worker instances for
Hi there,
I try to increase the number of executors per worker in the standalone mode
and I have failed to achieve that.
I followed a bit the instructions of this thread:
http://stackoverflow.com/questions/26645293/spark-configuration-memory-instance-cores
and did that:
spark.executor.memory 1g
memory to Spark on each worker node. Nothing to do with
# of executors.
Mohammed
*From:* Yiannis Gkoufas [mailto:johngou...@gmail.com]
*Sent:* Friday, February 20, 2015 4:55 AM
*To:* user@spark.apache.org
*Subject:* Setting the number of executors in standalone mode
Hi there,
I
31 matches
Mail list logo