Awesome, thanks for the PR Koert!
/Anders
On Thu, Dec 17, 2015 at 10:22 PM Prasad Ravilla <pras...@slalom.com> wrote:
> Thanks, Koert.
>
> Regards,
> Prasad.
>
> From: Koert Kuipers
> Date: Thursday, December 17, 2015 at 1:06 PM
> To: Prasad Ravilla
> Cc: An
lobs (or multiple paths comma separated) very
>> efficiently. AvroRelation should just pass the paths (comma separated).
>>
>>
>>
>>
>> On Thu, Oct 22, 2015 at 1:37 PM, Anders Arpteg <arp...@spotify.com>
>> wrote:
>>
>>> Yes, seems unnecessary.
so. killing it...
>
> On Thu, Sep 24, 2015 at 1:24 PM, Anders Arpteg <arp...@spotify.com> wrote:
>
>> Hi,
>>
>> Running spark 1.5.0 in yarn-client mode, and am curios in why there are
>> so many broadcast being done when loading datasets with large number o
Hi,
Received the following error when reading an Avro source with Spark 1.5.0
and the com.databricks.spark.avro reader. In the data source, there is one
nested field named "UserActivity.history.activity" and another named
"UserActivity.activity". This seems to be the reason for the execption,
Hi,
Running spark 1.5.0 in yarn-client mode, and am curios in why there are so
many broadcast being done when loading datasets with large number of
partitions/files. Have datasets with thousands of partitions, i.e. hdfs
files in the avro folder, and sometime loading hundreds of these large
Ok, thanks Reynold. When I tested dynamic allocation with Spark 1.4, it
complained saying that it was not tungsten compliant. Lets hope it works
with 1.5 then!
On Tue, Sep 8, 2015 at 5:49 AM Reynold Xin <r...@databricks.com> wrote:
>
> On Wed, Sep 2, 2015 at 12:03 AM, Anders
On Tue, Sep 1, 2015 at 8:03 AM, Anders Arpteg <arp...@spotify.com> wrote:
> > A fix submitted less than one hour after my mail, very impressive Davies!
> > I've compiled your PR and tested it with the large job that failed
> before,
> > and it seems to work fine
dav...@databricks.com> wrote:
> I had sent out a PR [1] to fix 2), could you help to test that?
>
> [1] https://github.com/apache/spark/pull/8543
>
> On Mon, Aug 31, 2015 at 12:34 PM, Anders Arpteg <arp...@spotify.com>
> wrote:
> > Was trying out 1.5 rc2 and noticed some issues w
Was trying out 1.5 rc2 and noticed some issues with the Tungsten shuffle
manager. One problem was when using the com.databricks.spark.avro reader
and the error(1) was received, see stack trace below. The problem does not
occur with the "sort" shuffle manager.
Another problem was in a large
/15 5:52 PM, Anders Arpteg wrote:
Yes, both the driver and the executors. Works a little bit better with
more space, but still a leak that will cause failure after a number of
reads. There are about 700 different data sources that needs to be loaded,
lots of data...
tor 25 jun 2015 08:02
...@manthan.com
skrev:
Did you try increasing the perm gen for the driver?
Regards
Sab
On 24-Jun-2015 4:40 pm, Anders Arpteg arp...@spotify.com wrote:
When reading large (and many) datasets with the Spark 1.4.0 DataFrames
parquet reader (the org.apache.spark.sql.parquet format), the following
When reading large (and many) datasets with the Spark 1.4.0 DataFrames
parquet reader (the org.apache.spark.sql.parquet format), the following
exceptions are thrown:
Exception in thread task-result-getter-0
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread
, 2015 at 8:45 PM, Yin Huai yh...@databricks.com wrote:
Does it happen every time you read a parquet source?
On Tue, Jun 2, 2015 at 3:42 AM, Anders Arpteg arp...@spotify.com wrote:
The log is from the log aggregation tool (hortonworks, yarn logs ...),
so both executors and driver. I'll send
Just compiled Spark 1.4.0-rc3 for Yarn 2.2 and tried running a job that
worked fine for Spark 1.3. The job starts on the cluster (yarn-cluster
mode), initial stage starts, but the job fails before any task succeeds
with the following error. Any hints?
[ERROR] [06/02/2015 09:05:36.962] [Executor
,
Shixiong Zhu
2015-06-02 17:11 GMT+08:00 Anders Arpteg arp...@spotify.com:
Just compiled Spark 1.4.0-rc3 for Yarn 2.2 and tried running a job that
worked fine for Spark 1.3. The job starts on the cluster (yarn-cluster
mode), initial stage starts, but the job fails before any task succeeds
the capacity scheduler or fifo scheduler without multi
resource scheduling by any chance?
On Thu, Feb 12, 2015 at 1:51 PM, Anders Arpteg arp...@spotify.com wrote:
The nm logs only seems to contain similar to the following. Nothing else
in the same time range. Any help?
2015-02-12 20:47:31,245 WARN
, Anders Arpteg arp...@spotify.com
wrote:
Sounds very similar to what I experienced Corey. Something that seems to
at least help with my problems is to have more partitions. Am already
fighting between ending up with too many partitions in the end and having
too few in the beginning
persistence kick in at that point?
On Sat, Feb 21, 2015 at 11:20 AM, Anders Arpteg arp...@spotify.com
wrote:
For large jobs, the following error message is shown that seems to
indicate that shuffle files for some reason are missing. It's a rather
large job with many partitions. If the data size
the memory that each executor has
allocated it happens in earlier stages but I can't seem to find anything
that says an executor (or container for that matter) has run low on memory.
On Mon, Feb 23, 2015 at 9:24 AM, Anders Arpteg arp...@spotify.com wrote:
No, unfortunately we're not making use
For large jobs, the following error message is shown that seems to indicate
that shuffle files for some reason are missing. It's a rather large job
with many partitions. If the data size is reduced, the problem disappears.
I'm running a build from Spark master post 1.2 (build at 2015-01-16) and
impossible. Are you able to find any of the container logs? Is the
NodeManager launching containers and reporting some exit code?
-Sandy
On Thu, Feb 12, 2015 at 1:21 PM, Anders Arpteg arp...@spotify.com wrote:
No, not submitting from windows, from a debian distribution. Had a quick
look
On Wed, Feb 11, 2015 at 1:28 PM, Anders Arpteg arp...@spotify.com wrote:
Hi,
Compiled the latest master of Spark yesterday (2015-02-10) for Hadoop
2.2 and failed executing jobs in yarn-cluster mode for that build. Works
successfully with spark 1.2 (and also master from 2015-01-16), so
run
manually to trace at what line the error has occurred.
BTW are you submitting the job from windows?
On Thu, Feb 12, 2015, 3:34 PM Anders Arpteg arp...@spotify.com wrote:
Interesting to hear that it works for you. Are you using Yarn 2.2 as
well? No strange log message during startup
Hi,
Compiled the latest master of Spark yesterday (2015-02-10) for Hadoop 2.2
and failed executing jobs in yarn-cluster mode for that build. Works
successfully with spark 1.2 (and also master from 2015-01-16), so something
has changed since then that prevents the job from receiving any executors
the second time the app gets launched.
On Thu, Jan 15, 2015 at 3:01 PM, Anders Arpteg arp...@spotify.com wrote:
Found a setting that seems to fix this problem, but it does not seems to
be available until Spark 1.3. See
https://issues.apache.org/jira/browse/SPARK-2165
However, glad to see a work
Found a setting that seems to fix this problem, but it does not seems to be
available until Spark 1.3. See
https://issues.apache.org/jira/browse/SPARK-2165
However, glad to see a work is being done with the issue.
On Tue, Jan 13, 2015 at 8:00 PM, Anders Arpteg arp...@spotify.com wrote:
Yes
at 3:29 AM, Sven Krasser kras...@gmail.com wrote:
Anders,
This could be related to this open ticket:
https://issues.apache.org/jira/browse/SPARK-5077. A call to coalesce()
also fixed that for us as a stopgap.
Best,
-Sven
On Mon, Jan 12, 2015 at 10:18 AM, Anders Arpteg arp...@spotify.com
Yes Andrew, I am. Tried setting spark.yarn.applicationMaster.waitTries to 1
(thanks Sean), but with no luck. Any ideas?
On Tue, Jan 13, 2015 at 7:58 PM, Andrew Or and...@databricks.com wrote:
Hi Anders, are you using YARN by any chance?
2015-01-13 0:32 GMT-08:00 Anders Arpteg arp
Since starting using Spark 1.2, I've experienced an annoying issue with
failing apps that gets executed twice. I'm not talking about tasks inside a
job, that should be executed multiple times before failing the whole app.
I'm talking about the whole app, that seems to close the previous Spark
...@cloudera.com wrote:
Hi Anders,
Have you checked your NodeManager logs to make sure YARN isn't killing
executors for exceeding memory limits?
-Sandy
On Tue, Jan 6, 2015 at 8:20 AM, Anders Arpteg arp...@spotify.com wrote:
Hey,
I have a job that keeps failing if too much data is processed
Scala
collection of dates and invoking a Spark operation for each. Simply
write dateList.par.map(...) to make the local map proceed in
parallel. It should invoke the Spark jobs simultaneously.
On Fri, Jan 9, 2015 at 10:46 AM, Anders Arpteg arp...@spotify.com wrote:
Hey,
Lets say we have
Hey,
Lets say we have multiple independent jobs that each transform some data
and store in distinct hdfs locations, is there a nice way to run them in
parallel? See the following pseudo code snippet:
dateList.map(date =
sc.hdfsFile(date).map(transform).saveAsHadoopFile(date))
It's unfortunate
Hey,
I have a job that keeps failing if too much data is processed, and I can't
see how to get it working. I've tried repartitioning with more partitions
and increasing amount of memory for the executors (now about 12G and 400
executors. Here is a snippets of the first part of the code, which
Hey,
I have a job that keeps failing if too much data is processed, and I can't
see how to get it working. I've tried repartitioning with more partitions
and increasing amount of memory for the executors (now about 12G and 400
executors. Here is a snippets of the first part of the code, which
at 11:06 PM, Anders Arpteg arp...@spotify.com
wrote:
Hey,
Tried to get the new spark.dynamicAllocation.enabled feature working on
Yarn
(Hadoop 2.2), but am unsuccessful so far. I've tested with the following
settings:
conf
.set(spark.dynamicAllocation.enabled, true
Hey,
Tried to get the new spark.dynamicAllocation.enabled feature working on
Yarn (Hadoop 2.2), but am unsuccessful so far. I've tested with the
following settings:
conf
.set(spark.dynamicAllocation.enabled, true)
.set(spark.shuffle.service.enabled, true)
Hey,
Tried to get the new spark.dynamicAllocation.enabled feature working on
Yarn (Hadoop 2.2), but am unsuccessful so far. I've tested with the
following settings:
conf
.set(spark.dynamicAllocation.enabled, true)
.set(spark.shuffle.service.enabled, true)
37 matches
Mail list logo