We have gotten this to work, but it requires instantiating the CoreNLP object
on the worker side. Because of the initialization time it makes a lot of sense
to do this inside of a .mapPartitions instead of a .map, for example.
As an aside, if you're using it from Scala, have a look at
MLlib relies on breeze for much of its linear algebra, which in turn relies on
netlib-java. netlib-java will attempt to load a native BLAS at runtime and then
attempt to load it's own precompiled version. Failing that, it will default
back to a Java version that it has built in. The Java
How many files do you have and how big is each JSON object?
Spark works better with a few big files vs many smaller ones. So you could try
cat'ing your files together and rerunning the same experiment.
- Evan
On Oct 18, 2014, at 12:07 PM, jan.zi...@centrum.cz jan.zi...@centrum.cz
wrote:
Try s3n://
On Aug 6, 2014, at 12:22 AM, sparkuser2345 hm.spark.u...@gmail.com wrote:
I'm getting the same Input path does not exist error also after setting the
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables and using
the format s3://bucket-name/test_data.txt for the
These files follow the libsvm format where each line is a record, the first
column is a label, and then after that the fields are offset:value where offset
is the offset into the feature vector, and value is the value of the input
feature.
This is a fairly efficient representation for sparse