Greetings!
We're reading input files with newApiHadoopFile that is configured with
multiline split. Everything's fine, besides
https://issues.apache.org/jira/browse/MAPREDUCE-6549. It looks like the
issue is fixed, but within hadoop 2.7.2. Which means we have to download
spark without hadoop and
Hi,
I have a web service that provides rest api to train random forest algo.
I train random forest on a 5 nodes spark cluster with enough memory -
everything is cached (~22 GB).
On a small datasets up to 100k samples everything is fine, but with the
biggest one (400k samples and ~70k features)
t;
> Joseph
>
> On Mon, Dec 14, 2015 at 10:52 AM, Eugene Morozov <
> evgeny.a.moro...@gmail.com> wrote:
>
>> Hello!
>>
>> I'm currently working on POC and try to use Random Forest (classification
>> and regression). I also have to check SVM and Mul
Hi, the feature looks like the one I'd like to use, but there are two
different descriptions in the docs of whether it's available.
I'm on a standalone deployment mode and here:
http://spark.apache.org/docs/latest/configuration.html it's specified the
feature is available only for YARN, but here:
Hi, I'm not sure where to put this, but I've found a typo on a page
Caching:
"
- *Serialization:* The default serialization in Spark is Java
serialization. However for better peformance, we recommend Kyro
serialization, which you can learn more about here
Praveen,
Zeppelin uses Spark's REPL.
I'm currently writing an app that is a web service, which is going to run
spark jobs.
So, at the init stage I just create JavaSparkContext and then use it for
all users requests. Web service is stateless. The issue with stateless is
that it's possible to run
Hi!
I've looked through issues and haven't found anything like that, so I've
created a new one. Everything to reproduce is attached to it:
https://issues.apache.org/jira/browse/SPARK-12367
Could you, please, take a look and if possible advice any workaround.
Thank you in advance.
--
Be well!
Hello!
I'm currently working on POC and try to use Random Forest (classification
and regression). I also have to check SVM and Multiclass perceptron (other
algos are less important at the moment). So far I've discovered that Random
Forest has a limitation of maxDepth for trees and just out of
.
--
Be well!
Jean Morozov
On Tue, Oct 6, 2015 at 1:58 AM, Davies Liu <dav...@databricks.com> wrote:
> Could you tell us a way to reproduce this failure? Reading from JSON or
> Parquet?
>
> On Mon, Oct 5, 2015 at 4:28 AM, Eugene Morozov
> <evgeny.a.moro...@gmail.com> w
Hi,
We're building our own framework on top of spark and we give users pretty
complex schema to work with. That requires from us to build dataframes by
ourselves: we transform business objects to rows and struct types and uses
these two to create dataframe.
Everything was fine until I started to
Hi,
I'm using spark 1.3.1 built against hadoop 1.0.4 and java 1.7 and I'm
trying to save my data frame to parquet.
The issue I'm stuck looks like serialization tries to do pretty weird
thing: tries to write to an empty array.
The last (through stack trace) line of spark code that leads to
Hi!
I’d like to complete action (store / print smth) inside of transformation (map
or mapPartitions). This approach has some flaws, but there is a question. Might
it happen that Spark will optimise (RDD or DataFrame) processing so that my
mapPartitions simply won’t happen?
--
Eugene Morozov
more previous
discussions RE: Kryo upgrade.
Anyhow, I'm not sure what the right solution is yet, but just wanted to link
to some previous context / discussions.
- Josh
On Thu, Jul 16, 2015 at 7:57 AM, Eugene Morozov fathers...@list.ru wrote:
Hi, some time ago we’ve found that it’s
.
Thanks.
--
Eugene Morozov
fathers...@list.ru
.
Eugene Morozov
fathers...@list.ru
class not found.
Is my “new” understanding correct? Could you, please, explain in couple of
words how code being moved from Driver to Workers? Could you give me a hint of
where to find this in sources?
Thanks in advance.
--
Eugene Morozov
fathers...@list.ru
16 matches
Mail list logo