date:20180925

Filtering based on a float value with more than one decimal place not working correctly in Pyspark dataframe

2018-09-25 Thread Meethu Mathew

Hi all, I tried the following code and the output was not as expected. schema = StructType([StructField('Id', StringType(), False), > StructField('Value', FloatType(), False)]) > df_test = spark.createDataFrame([('a',5.0),('b',1.236),('c',-0.31)],schema) df_test Output :

RE: Python kubernetes spark 2.4 branch

2018-09-25 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)

Hi Ilan/ Yinan, Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736 My spark-submit is as follows: ./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443 --conf spark.app.name=spark-py

Re: [DISCUSS] Cascades style CBO for Spark SQL

2018-09-25 Thread Xiao Li

Hi, Xiaoju, Thanks for sending this to the dev list. The current join reordering rule is just a stats based optimizer rule. Either top-down or bottom-up optimization can achieve the same-level optimized plans. DB2 is using bottom up. In the future, we plan to move the stats based join reordering

[Discuss] Language Interop for Apache Spark

2018-09-25 Thread tcondie

There seems to be some desire for third party language extensions for Apache Spark. Some notable examples include: * C#/F# from project Mobius https://github.com/Microsoft/Mobius * Haskell from project sparkle https://github.com/tweag/sparkle * Julia from project Spark.jl

Re: Python kubernetes spark 2.4 branch

2018-09-25 Thread Ilan Filonenko

Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ? On Tue, Sep 25, 2018 at 12:38 PM Yinan Li wrote: > Can you give more details on how you ran your app, did you build your own > image, and which image are you using? > > On Tue, Sep 25, 2018 at 10:23 AM Garlapati,

Accumulator issues in PySpark

2018-09-25 Thread Abdeali Kothari

I was trying to check out accumulators and see if I could use them for anything. I made a demo program and could not figure out how to add them up. I found that I need to do a shuffle between all my python UDFs that I am running for the accumulators to be run. Basically, if I do 5 withColumn()

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-25 Thread Ryan Blue

I agree with Wenchen that we'd remove the prefix when passing to a source, so you could use the same "spark.yarn.keytab" option in both places. But I think the problem is that "spark.yarn.keytab" still needs to be set, and it clearly isn't in a shared namespace for catalog options. So I think we

Re: Python kubernetes spark 2.4 branch

2018-09-25 Thread Yinan Li

Can you give more details on how you ran your app, did you build your own image, and which image are you using? On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) wrote: > Hi, > > I am trying to run spark python testcases on k8s based on tag > spark-2.4-rc1. When

Re: Support for Second level of concurrency

2018-09-25 Thread Sandeep Mahendru

Hey Jorn, Appreciate the prompt reply. Yeah that would surely work, we have tried a similar approach. The only concern here is that to make the solution low latency, we want to avoid routing through a message broker. Regards, Sandeep. On Tue, Sep 25, 2018 at 12:53 PM Jörn Franke wrote: >

Python kubernetes spark 2.4 branch

2018-09-25 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)

Hi, I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue? Regards Surya

Re: Support for Second level of concurrency

2018-09-25 Thread Jörn Franke

What is the ultimate goal of this algorithm? There could be already algorithms that can do this within Spark. You could also put a message on Kafka (or another broker) and have spark applications listen to them to trigger further computation. This would be also more controlled and can be done

Re: Support for Second level of concurrency

2018-09-25 Thread Reynold Xin

That’s a pretty major architectural change and would be extremely difficult to do at this stage. On Tue, Sep 25, 2018 at 9:31 AM sandeep mehandru wrote: > Hi Folks, > >There is a use-case , where we are doing large computation on two large > vectors. It is basically a scenario, where we run

Support for Second level of concurrency

2018-09-25 Thread sandeep mehandru

Hi Folks, There is a use-case , where we are doing large computation on two large vectors. It is basically a scenario, where we run a flatmap operation on the Left vector and run co-relation logic by comparing it with all the rows of the second vector. When this flatmap operation is running on

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-25 Thread tigerquoll

To give some Kerberos specific examples, The spark-submit args: -–conf spark.yarn.keytab=path_to_keytab -–conf spark.yarn.principal=princi...@realm.com are currently not passed through to the data sources. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Filtering based on a float value with more than one decimal place not working correctly in Pyspark dataframe

RE: Python kubernetes spark 2.4 branch

Re: [DISCUSS] Cascades style CBO for Spark SQL

[Discuss] Language Interop for Apache Spark

Re: Python kubernetes spark 2.4 branch

Accumulator issues in PySpark

Re: [Discuss] Datasource v2 support for Kerberos

Re: Python kubernetes spark 2.4 branch

Re: Support for Second level of concurrency

Python kubernetes spark 2.4 branch

Re: Support for Second level of concurrency

Re: Support for Second level of concurrency

Support for Second level of concurrency

Re: [Discuss] Datasource v2 support for Kerberos

14 matches

Site Navigation

Mail list logo

Footer information