Re: Is spark not good for ingesting into updatable databases?

2018-10-30 Thread ravidspark
Hi Jorn,

Just want to check if you got a chance to look at this problem. I couldn't
figure out any reason on why this is happening. Any help would be
appreciated.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: unsubsribe

2018-10-30 Thread Anu B Nair
I have already send minimum 10 times! Today also I have send one!

On Tue, Oct 30, 2018 at 3:51 PM Biplob Biswas 
wrote:

> You need to send the email to user-unsubscr...@spark.apache.org and not
> to the usergroup.
>
> Thanks & Regards
> Biplob Biswas
>
>
> On Tue, Oct 30, 2018 at 10:59 AM Anu B Nair  wrote:
>
>> I am sending this Unsubscribe mail for last few months! It never happens!
>> If anyone can help us to unsubscribe it wil be really helpful!
>>
>> On Tue, Oct 30, 2018 at 3:27 PM Mohan Palavancha <
>> mohan.palavan...@gmail.com> wrote:
>>
>>>
>>>


Re: unsubsribe

2018-10-30 Thread Biplob Biswas
You need to send the email to user-unsubscr...@spark.apache.org and not to
the usergroup.

Thanks & Regards
Biplob Biswas


On Tue, Oct 30, 2018 at 10:59 AM Anu B Nair  wrote:

> I am sending this Unsubscribe mail for last few months! It never happens!
> If anyone can help us to unsubscribe it wil be really helpful!
>
> On Tue, Oct 30, 2018 at 3:27 PM Mohan Palavancha <
> mohan.palavan...@gmail.com> wrote:
>
>>
>>


Re: unsubsribe

2018-10-30 Thread Anu B Nair
I am sending this Unsubscribe mail for last few months! It never happens!
If anyone can help us to unsubscribe it wil be really helpful!

On Tue, Oct 30, 2018 at 3:27 PM Mohan Palavancha 
wrote:

>
>


unsubsribe

2018-10-30 Thread Mohan Palavancha



Java Spark to Python spark integration

2018-10-30 Thread Manohar Rao
I would like to know if its possible to invoke python spark code from java.

I have a java based framework where
a sparksession is created and a some dataframes are passed as argument to
an api .

Transformation.java
interface   Transformation
 {
 Dataset transform(Set inputDatasets , SparkSession spark);
 }

A user of this framework can them implement a transformation and the
framework can then use this custom transformation
along with rest of the standard transformations . This then integrates into
a larger data pipeline.

Question.

Some users would like to use python (pyspark ) code to write business logic.

Is there a possibility of passing this java Dataset ( or RDD) via the
framework
to python code and then retrieving the python RDD/dataset back as the
output to the java framework.

Any reference to some code snippets around this will be helpful  .

Thanks

Manohar


Re: java vs scala for Apache Spark - is there a performance difference ?

2018-10-30 Thread Jörn Franke
Older versions of Spark had indeed a lower performance on Python and R due to a 
conversion need between JVM datatypes and python/r datatypes. This changed in 
Spark 2.2, I think, with the integration of Apache Arrow.  However, what you do 
after the conversion in those languages can be still slower than, for instance, 
in Java if you do not use Spark only functions. It could be also faster (eg you 
use a python module implemented natively in C and if there is no translation 
into c datatypes needed). 
Scala has in certain cases a more elegant syntax than Java (if you do not use 
Lambda). Sometimes this elegant syntax can lead to (unintentional) inefficient 
things for which there is a better way to express them (eg implicit 
conversions, use of collection methods etc). However there are better ways and 
you just have to spot these issues in the source code and address them, if 
needed. 
So a comparison does not make really sense between those languages - it always 
depends.

> Am 30.10.2018 um 07:00 schrieb akshay naidu :
> 
> how about Python. 
> java vs scala vs python vs R
> which is better.
> 
>> On Sat, Oct 27, 2018 at 3:34 AM karan alang  wrote:
>> Hello 
>> - is there a "performance" difference when using Java or Scala for Apache 
>> Spark ?
>> 
>> I understand, there are other obvious differences (less code with scala, 
>> easier to focus on logic etc), 
>> but wrt performance - i think there would not be much of a difference since 
>> both of them are JVM based, 
>> pls. let me know if this is not the case.
>> 
>> thanks!


Re: dremel paper example schema

2018-10-30 Thread Gourav Sengupta
Super,


Now it makes sense, I am copying Holden in this email.

Regards,
Gourav

On Tue, 30 Oct 2018, 06:34 lchorbadjiev, 
wrote:

> Hi Gourav,
>
> the question in fact is are there any the limitations of Apache Spark
> support for Parquet file format.
>
> The example schema from the dremel paper is something that is supported in
> Apache Parquet (using Apache Parquet Java API).
>
> Now I am trying to implement the same schema using Apache Spark SQL types,
> but not very successful. And probably this is not unexpected.
>
> What was unexpected is that Apache Spark can't read a parquet file with the
> dremel example schema.
>
> Probably there are some limitations of what Apache Spark can support from
> Apache Parquet file format, but for me it is not obvious what this
> limitations are.
>
> Thanks,
> Lubomir Chorbadjiev
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: dremel paper example schema

2018-10-30 Thread Jörn Franke
Are you using the same parquet version as Spark uses? Are you using a recent 
version of Spark? Why don’t you create the file in Spark?

> Am 30.10.2018 um 07:34 schrieb lchorbadjiev :
> 
> Hi Gourav,
> 
> the question in fact is are there any the limitations of Apache Spark
> support for Parquet file format.
> 
> The example schema from the dremel paper is something that is supported in
> Apache Parquet (using Apache Parquet Java API).
> 
> Now I am trying to implement the same schema using Apache Spark SQL types,
> but not very successful. And probably this is not unexpected. 
> 
> What was unexpected is that Apache Spark can't read a parquet file with the
> dremel example schema.
> 
> Probably there are some limitations of what Apache Spark can support from
> Apache Parquet file format, but for me it is not obvious what this
> limitations are.
> 
> Thanks,
> Lubomir Chorbadjiev
> 
> 
> 
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: dremel paper example schema

2018-10-30 Thread lchorbadjiev
Hi Gourav,

the question in fact is are there any the limitations of Apache Spark
support for Parquet file format.

The example schema from the dremel paper is something that is supported in
Apache Parquet (using Apache Parquet Java API).

Now I am trying to implement the same schema using Apache Spark SQL types,
but not very successful. And probably this is not unexpected. 

What was unexpected is that Apache Spark can't read a parquet file with the
dremel example schema.

Probably there are some limitations of what Apache Spark can support from
Apache Parquet file format, but for me it is not obvious what this
limitations are.

Thanks,
Lubomir Chorbadjiev



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: java vs scala for Apache Spark - is there a performance difference ?

2018-10-30 Thread akshay naidu
how about Python.
java vs scala vs python vs R
which is better.

On Sat, Oct 27, 2018 at 3:34 AM karan alang  wrote:

> Hello
> - is there a "performance" difference when using Java or Scala for Apache
> Spark ?
>
> I understand, there are other obvious differences (less code with scala,
> easier to focus on logic etc),
> but wrt performance - i think there would not be much of a difference
> since both of them are JVM based,
> pls. let me know if this is not the case.
>
> thanks!
>