It looks like Spark 1.5.1 does not work with IPv6. When
adding -Djava.net.preferIPv6Addresses=true on my dual stack server, the
driver fails with:
15/10/14 14:36:01 ERROR SparkContext: Error initializing SparkContext.
java.lang.AssertionError: assertion failed: Expected hostname
at
ng(hostPort).hasPort, message)
}
On Wed, Oct 14, 2015 at 2:40 PM, Thomas Dudziak <tom...@gmail.com> wrote:
> It looks like Spark 1.5.1 does not work with IPv6. When
> adding -Djava.net.preferIPv6Addresses=true on my dual stack server, the
> driver fails with:
>
> 15
http://yahoohadoop.tumblr.com/post/129872361846/large-scale-distributed-deep-learning-on-hadoop
I would be curious to learn what the Spark developer's plans are in this
area (NNs, GPUs) and what they think of integration with existing NN
frameworks like Caffe or Torch.
cheers,
Tom
I want to use t-digest with foreachPartition and accumulators (essentially,
create a t-digest per partition and add that to the accumulator leveraging
the fact that t-digests can be added to each other). I can make t-digests
kryo-serializable easily but java-serializable is not very easy.
Now,
a lot of
> garbage, making it slower. SMJ performance is probably 5x - 1000x better in
> 1.5 for your case.
>
>
> On Thu, Aug 27, 2015 at 6:03 PM, Thomas Dudziak <tom...@gmail.com> wrote:
>
>> I'm getting errors like "Removing executor with no recent heartbeats&
the
spark.sql.shuffle.partitions=1000. In my case, 16k partitions worked for
me, but your tables look a little denser, so you may want to go even higher.
On Thu, Aug 27, 2015 at 6:04 PM Thomas Dudziak tom...@gmail.com wrote:
I'm getting errors like Removing executor with no recent heartbeats
Missing
the answer was to further up the
spark.sql.shuffle.partitions=1000. In my case, 16k partitions worked for
me, but your tables look a little denser, so you may want to go even higher.
On Thu, Aug 27, 2015 at 6:04 PM Thomas Dudziak tom...@gmail.com wrote:
I'm getting errors like Removing
I'm getting errors like Removing executor with no recent heartbeats
Missing an output location for shuffle errors for a large SparkSql join
(1bn rows/2.5TB joined with 1bn rows/30GB) and I'm not sure how to
configure the job to avoid them.
The initial stage completes fine with some 30k tasks on
...@gmail.com wrote:
Have you tried tablesample? You find the exact syntax in the
documentation, but it exlxactly does what you want
Le mer. 26 août 2015 à 18:12, Thomas Dudziak tom...@gmail.com a écrit :
Sorry, I meant without reading from all splits. This is a single
partition in the table
I have a sizeable table (2.5T, 1b rows) that I want to get ~100m rows from
and I don't particularly care which rows. Doing a LIMIT unfortunately
results in two stages where the first stage reads the whole table, and the
second then performs the limit with a single worker, which is not very
Sorry, I meant without reading from all splits. This is a single partition
in the table.
On Wed, Aug 26, 2015 at 8:53 AM, Thomas Dudziak tom...@gmail.com wrote:
I have a sizeable table (2.5T, 1b rows) that I want to get ~100m rows from
and I don't particularly care which rows. Doing a LIMIT
-grained scheduler, there is a spark.cores.max config setting that
will limit the total # of cores it grabs. This was there in earlier
versions too.
Matei
On May 19, 2015, at 12:39 PM, Thomas Dudziak tom...@gmail.com wrote:
I read the other day that there will be a fair number of improvements
I read the other day that there will be a fair number of improvements in
1.4 for Mesos. Could I ask for one more (if it isn't already in there): a
configurable limit for the number of tasks for jobs run on Mesos ? This
would be a very simple yet effective way to prevent a job dominating the
Under certain circumstances that I haven't yet been able to isolate, I get
the following error when doing a HQL query using HiveContext (Spark 1.3.1
on Mesos, fine-grained mode). Is this a known problem or should I file a
JIRA for it ?
org.apache.spark.SparkException: Can only zip RDDs with same
This is still a problem in 1.3. Optional is both used in several shaded
classes within Guava (e.g. the Immutable* classes) and itself uses shaded
classes (e.g. AbstractIterator). This causes problems in application code.
The only reliable way we've found around this is to shade Guava ourselves
for
Actually the extraClassPath settings put the extra jars at the end of the
classpath so they won't help. Only the deprecated SPARK_CLASSPATH puts them
at the front.
cheers,
Tom
On Fri, May 15, 2015 at 11:54 AM, Marcelo Vanzin van...@cloudera.com
wrote:
Ah, I see. yeah, it sucks that Spark has
I've just been through this exact case with shaded guava in our Mesos setup
and that is how it behaves there (with Spark 1.3.1).
cheers,
Tom
On Fri, May 15, 2015 at 12:04 PM, Marcelo Vanzin van...@cloudera.com
wrote:
On Fri, May 15, 2015 at 11:56 AM, Thomas Dudziak tom...@gmail.com wrote
17 matches
Mail list logo