Some Spark MLLIB tests failing due to some classes not being registered with Kryo

2017-11-11 Thread Jorge Sánchez
Hi Dev,

I'm running the MLLIB tests in the current Master branch and the following
Suites are failing due to some classes not being registered with Kryo:

org.apache.spark.mllib.MatricesSuite
org.apache.spark.mllib.VectorsSuite
org.apache.spark.ml.InstanceSuite

I can solve it by registering the failing classes with Kryo, but I'm
wondering if I'm missing something as these tests shouldn't be failing from
Master.

Any suggestions on what I may be doing wrong?

Thank you.


Re: Some Spark MLLIB tests failing due to some classes not being registered with Kryo

2017-11-11 Thread Marco Gaido
Hi Jorge,

then try running the tests not from the mllib folder, but on Spark base
directory.
If you want to run only the tests in mllib, you can specify the project
using the -pl argument of mvn.

Thanks,
Marco



2017-11-11 13:37 GMT+01:00 Jorge Sánchez :

> Hi Marco,
>
> Just mvn test from the mllib folder.
>
> Thank you.
>
> 2017-11-11 12:36 GMT+00:00 Marco Gaido :
>
>> Hi Jorge,
>>
>> how are you running those tests?
>>
>> Thanks,
>> Marco
>>
>> 2017-11-11 13:21 GMT+01:00 Jorge Sánchez :
>>
>>> Hi Dev,
>>>
>>> I'm running the MLLIB tests in the current Master branch and the
>>> following Suites are failing due to some classes not being registered with
>>> Kryo:
>>>
>>> org.apache.spark.mllib.MatricesSuite
>>> org.apache.spark.mllib.VectorsSuite
>>> org.apache.spark.ml.InstanceSuite
>>>
>>> I can solve it by registering the failing classes with Kryo, but I'm
>>> wondering if I'm missing something as these tests shouldn't be failing from
>>> Master.
>>>
>>> Any suggestions on what I may be doing wrong?
>>>
>>> Thank you.
>>>
>>
>>
>


Re: Some Spark MLLIB tests failing due to some classes not being registered with Kryo

2017-11-11 Thread Jorge Sánchez
No luck running the full test suites with mvn test from the main folder or
just mvn -pl mllib.

Any other suggestion would be much appreciated.

Thank you.

2017-11-11 12:46 GMT+00:00 Marco Gaido :

> Hi Jorge,
>
> then try running the tests not from the mllib folder, but on Spark base
> directory.
> If you want to run only the tests in mllib, you can specify the project
> using the -pl argument of mvn.
>
> Thanks,
> Marco
>
>
>
> 2017-11-11 13:37 GMT+01:00 Jorge Sánchez :
>
>> Hi Marco,
>>
>> Just mvn test from the mllib folder.
>>
>> Thank you.
>>
>> 2017-11-11 12:36 GMT+00:00 Marco Gaido :
>>
>>> Hi Jorge,
>>>
>>> how are you running those tests?
>>>
>>> Thanks,
>>> Marco
>>>
>>> 2017-11-11 13:21 GMT+01:00 Jorge Sánchez :
>>>
 Hi Dev,

 I'm running the MLLIB tests in the current Master branch and the
 following Suites are failing due to some classes not being registered with
 Kryo:

 org.apache.spark.mllib.MatricesSuite
 org.apache.spark.mllib.VectorsSuite
 org.apache.spark.ml.InstanceSuite

 I can solve it by registering the failing classes with Kryo, but I'm
 wondering if I'm missing something as these tests shouldn't be failing from
 Master.

 Any suggestions on what I may be doing wrong?

 Thank you.

>>>
>>>
>>
>


how to replace hdfs with a custom distributed fs ?

2017-11-11 Thread Cristian Lorenzetto
hi i have my distributed java fs and i would like to implement my class for
storing data in spark.
How to do? it there a example how to do?


Re: how to replace hdfs with a custom distributed fs ?

2017-11-11 Thread Reynold Xin
You can implement the Hadoop FileSystem API for your distributed java fs
and just plug into Spark using the Hadoop API.


On Sat, Nov 11, 2017 at 9:37 AM, Cristian Lorenzetto <
cristian.lorenze...@gmail.com> wrote:

> hi i have my distributed java fs and i would like to implement my class
> for storing data in spark.
> How to do? it there a example how to do?
>


is there a way for removing hadoop from spark

2017-11-11 Thread Cristian Lorenzetto
Considering the case i neednt hdfs, it there a way for removing completely
hadoop from spark?
Is YARN the unique dependency in spark?
is there no java or scala (jdk langs)YARN-like lib to embed in a project
instead to call external servers?
YARN lib is difficult to customize?

I made different questions for understanding what is the better way for me


Re: is there a way for removing hadoop from spark

2017-11-11 Thread yohann jardin
Hey Cristian,

You don’t need to remove anything. Spark has a standalone mode. Actually that’s 
the default. https://spark.apache.org/docs/latest/spark-standalone.html

When building Spark (and you should build it yourself), just use the option 
that suits you: https://spark.apache.org/docs/latest/building-spark.html

Regards,

Yohann Jardin

Le 11-Nov-17 à 6:42 PM, Cristian Lorenzetto a écrit :
Considering the case i neednt hdfs, it there a way for removing completely 
hadoop from spark?
Is YARN the unique dependency in spark?
is there no java or scala (jdk langs)YARN-like lib to embed in a project 
instead to call external servers?
YARN lib is difficult to customize?

I made different questions for understanding what is the better way for me



Spark json data - avro schema validation

2017-11-11 Thread Barath Ramamoorthy

Hi I have a spark streaming application which receives logs that has encoded 
json in it. The json complies to a avro schema and part of the process I m 
converting the json to a data class which of course is a row in dataset. It’s a 
nested object indeed.

In this scenario I m looking to validate the inbound json to see if it complies 
to the definition of avro schema. I m not finding any approach that already 
exists to perform this validation or not aware of . I am hoping to get some 
direction from this group to get going on the validation front.

Thanks
Barath.