Unsubscribe

2018-06-14 Thread Kumar S, Sajive
Unsubscribe

Unsubscribe

2018-06-14 Thread Congxian Qiu
Unsubscribe -- Blog:http://www.klion26.com GTalk:qcx978132955 一切随心

Support SqlStreaming in spark

2018-06-14 Thread JackyLee
Hello Nowadays, more and more streaming products begin to support SQL streaming, such as KafaSQL, Flink SQL and Storm SQL. To support SQL Streaming can not only reduce the threshold of streaming, but also make streaming easier to be accepted by everyone. At present, StructStreaming is

Re: array_contains in package org.apache.spark.sql.functions

2018-06-14 Thread 刘崇光
Hello Takuya, Thanks for your message. I will do the JIRA and PR. Best regards, Chongguang On Thu, Jun 14, 2018 at 11:25 PM, Takuya UESHIN wrote: > Hi Chongguang, > > Thanks for the report! > > That makes sense and the proposition should work, or we can add something > like `def

Re: array_contains in package org.apache.spark.sql.functions

2018-06-14 Thread Takuya UESHIN
Hi Chongguang, Thanks for the report! That makes sense and the proposition should work, or we can add something like `def array_contains(column: Column, value: Column)`. Maybe other functions, such as `array_position`, `element_at`, are the same situation. Could you file a JIRA, and submit a PR

Re: [VOTE] SPIP ML Pipelines in R

2018-06-14 Thread Hossein
The vote passed with following +1. - Felix - Joseph - Xiangrui - Reynold Joseph has kindly volunteered to shepherd this. Thanks, --Hossein On Thu, Jun 14, 2018 at 1:32 PM Reynold Xin wrote: > +1 on the proposal. > > > On Fri, Jun 1, 2018 at 8:17 PM Hossein wrote: > >> Hi Shivaram, >> >> We

Re: [VOTE] SPIP ML Pipelines in R

2018-06-14 Thread Reynold Xin
+1 on the proposal. On Fri, Jun 1, 2018 at 8:17 PM Hossein wrote: > Hi Shivaram, > > We converged on a CRAN release process that seems identical to current > SparkR. > > --Hossein > > On Thu, May 31, 2018 at 9:10 AM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> Hossein --

Re: [ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-14 Thread Hadrien Chicault
Unsubscribe et Le jeu. 14 juin 2018 à 20:59, Jules Damji a écrit : > > > Matei & I own it. I normally tweet or handle Spark related PSAs > > Cheers > Jules > > Sent from my iPhone > Pardon the dumb thumb typos :) > > > On Jun 14, 2018, at 11:45 AM, Marcelo Vanzin > wrote: > > > > Hi Jacek, > >

Re: Shared variable in executor level

2018-06-14 Thread Nikodimos Nikolaidis
Thanks, that's what I was looking for. On 06/14/2018 04:41 PM, Sean Owen wrote: > Just use a singleton or static variable. It will be a simple per-JVM > value that is therefore per-executor. > > On Thu, Jun 14, 2018 at 6:59 AM Nikodimos Nikolaidis > mailto:niknik...@csd.auth.gr>> wrote: > >

Re: [ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-14 Thread Jules Damji
Matei & I own it. I normally tweet or handle Spark related PSAs Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Jun 14, 2018, at 11:45 AM, Marcelo Vanzin > wrote: > > Hi Jacek, > > I seriously have no idea... I don't even know who owns that account (I > hope they

Re: [ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-14 Thread Marcelo Vanzin
Hi Jacek, I seriously have no idea... I don't even know who owns that account (I hope they have some connection with the PMC?). But it seems whoever owns it already sent something. On Thu, Jun 14, 2018 at 12:31 AM, Jacek Laskowski wrote: > Hi Marcelo, > > How to announce it on twitter @

Unsubscribe

2018-06-14 Thread Mohamed Gabr
Unsubscribe - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Unsubscribe

2018-06-14 Thread Thiago
Unsubscribe

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Marcelo Vanzin
Yes, my bad. The code in session.py needs to also catch TypeError like before. On Thu, Jun 14, 2018 at 11:03 AM, Li Jin wrote: > Sounds good. Thanks all for the quick reply. > > https://issues.apache.org/jira/browse/SPARK-24563 > > > On Thu, Jun 14, 2018 at 12:19 PM, Xiao Li wrote: >> >> Thanks

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Li Jin
Sounds good. Thanks all for the quick reply. https://issues.apache.org/jira/browse/SPARK-24563 On Thu, Jun 14, 2018 at 12:19 PM, Xiao Li wrote: > Thanks for catching this. Please feel free to submit a PR. I do not think > Vanzin wants to introduce the behavior changes in that PR. We should do

unsubscribe

2018-06-14 Thread Huamin Li
unsubscribe

unsubscribe

2018-06-14 Thread Vasilis Hadjipanos
unsubscribe

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Xiao Li
Thanks for catching this. Please feel free to submit a PR. I do not think Vanzin wants to introduce the behavior changes in that PR. We should do the code review more carefully. Xiao 2018-06-14 9:18 GMT-07:00 Li Jin : > Are there objection to restore the behavior for PySpark users? I am happy >

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Li Jin
Are there objection to restore the behavior for PySpark users? I am happy to submit a patch. On Thu, Jun 14, 2018 at 12:15 PM Reynold Xin wrote: > The behavior change is not good... > > On Thu, Jun 14, 2018 at 9:05 AM Li Jin wrote: > >> Ah, looks like it's this change: >> >>

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Reynold Xin
The behavior change is not good... On Thu, Jun 14, 2018 at 9:05 AM Li Jin wrote: > Ah, looks like it's this change: > > https://github.com/apache/spark/commit/b3417b731d4e323398a0d7ec6e86405f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5 > > It seems strange that by default Spark doesn't build

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Li Jin
Ah, looks like it's this change: https://github.com/apache/spark/commit/b3417b731d4e323398a0d7ec6e86405f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5 It seems strange that by default Spark doesn't build with Hive but by default PySpark requires it... This might also be a behavior change to

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Sean Owen
I think you would have to build with the 'hive' profile? but if so that would have been true for a while now. On Thu, Jun 14, 2018 at 10:38 AM Li Jin wrote: > Hey all, > > I just did a clean checkout of github.com/apache/spark but failed to > start PySpark, this is what I did: > > git clone

Dot file from execution plan

2018-06-14 Thread Leonardo Herrera
Hi, We have an automatic report creation tool that creates Spark SQL jobs based on user instruction (this is a web application). We'd like to give the users an opportunity to visualize the execution plan of their handiwork before them inflicting it to the world. Currently, I'm just capturing the

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Li Jin
I can work around by using: bin/pyspark --conf spark.sql.catalogImplementation=in-memory now, but still wonder what's going on with HiveConf.. On Thu, Jun 14, 2018 at 11:37 AM, Li Jin wrote: > Hey all, > > I just did a clean checkout of github.com/apache/spark but failed to > start PySpark,

Missing HiveConf when starting PySpark from head

2018-06-14 Thread Li Jin
Hey all, I just did a clean checkout of github.com/apache/spark but failed to start PySpark, this is what I did: git clone g...@github.com:apache/spark.git; cd spark; build/sbt package; bin/pyspark And got this exception: (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark Python 3.6.3

Re: Spark issue 20236 - overwrite a partitioned data srouce

2018-06-14 Thread Marco Gaido
Hi Alessandro, I'd recommend you to check the UTs added in the commit which solved the issue (ie. https://github.com/apache/spark/commit/a66fe36cee9363b01ee70e469f1c968f633c5713). You can use them to try and reproduce the issue. Thanks, Marco 2018-06-14 15:57 GMT+02:00 Alessandro Liparoti : >

Spark issue 20236 - overwrite a partitioned data srouce

2018-06-14 Thread Alessandro Liparoti
Good morning, I am trying to see how this bug affects the write in spark 2.2.0, but I cannot reproduce it. Is it ok then using the code df.write.mode(SaveMode.Overwrite).insertInto("table_name") ? Thank you, *Alessandro Liparoti*

Re: Shared variable in executor level

2018-06-14 Thread Sean Owen
Just use a singleton or static variable. It will be a simple per-JVM value that is therefore per-executor. On Thu, Jun 14, 2018 at 6:59 AM Nikodimos Nikolaidis wrote: > Hello community, > > I am working on a project in which statistics (like predicate > selectivity) are collected during

Re: Live Streamed Code Review today at 11am Pacific

2018-06-14 Thread Holden Karau
Next week is pride in San Francisco but I'm still going to do two quick session. One will be live coding with Apache Spark to collect ASF diversity information ( https://www.youtube.com/watch?v=OirnFnsU37A / https://www.twitch.tv/events/O1edDMkTRBGy0I0RCK-Afg ) on Monday at 9am pacific and the

Shared variable in executor level

2018-06-14 Thread Nikodimos Nikolaidis
Hello community, I am working on a project in which statistics (like predicate selectivity) are collected during execution. I think that it's a good idea to keep these statistics in executor level. So, all tasks in same executor share the same variable and no extra network traffic is needed.

Re: Very slow complex type column reads from parquet

2018-06-14 Thread Jakub Wozniak
Dear Ryan, Thanks a lot for your answer. After having sent the e-mail we have investigated a bit more the data itself. It happened that for certain days it was very skewed and one of the row groups had much more records that all others. This was somehow related to the fact that we have sorted it

Fwd: array_contains in package org.apache.spark.sql.functions

2018-06-14 Thread 刘崇光
-- Forwarded message -- From: 刘崇光 Date: Thu, Jun 14, 2018 at 11:08 AM Subject: array_contains in package org.apache.spark.sql.functions To: u...@spark.apache.org Hello all, I ran into a use case in project with spark sql and want to share with you some thoughts about the

Re: [ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-14 Thread Jacek Laskowski
Hi Marcelo, How to announce it on twitter @ https://twitter.com/apachespark? How to make it part of the release process? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming