Re: Correlated subqueries in the DataFrame API

2018-04-27 Thread Nicholas Chammas
What about exposing transforms that make it easy to coerce data to what the method needs? Instead of passing a dataframe, you’d pass df.toSet to isin Assuming toSet returns a local list, wouldn’t that have the problem of not being able to handle extremely large lists? In contrast, I believe SQL’s

Re: [MLLib] Logistic Regression and standadization

2018-04-27 Thread Valeriy Avanesov
Hi all, maybe I'm missing something, but from what was discussed here I've gathered that the current mllib implementation returns exactly the same model whether standardization is turned on or off. I suggest to consider an R script (please, see below) which trains two penalized logistic

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Thakrar, Jayesh
Thanks Joseph! From: Joseph Torres Date: Friday, April 27, 2018 at 11:23 AM To: "Thakrar, Jayesh" Cc: "dev@spark.apache.org" Subject: Re: Datasource API V2 and checkpointing The precise interactions with the

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Joseph Torres
The precise interactions with the DataSourceV2 API haven't yet been hammered out in design. But much of this comes down to the core of Structured Streaming rather than the API details. The execution engine handles checkpointing and recovery. It asks the streaming data source for offsets, and then

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Thakrar, Jayesh
Wondering if this issue is related to SPARK-23323? Any pointers will be greatly appreciated…. Thanks, Jayesh From: "Thakrar, Jayesh" Date: Monday, April 23, 2018 at 9:49 PM To: "dev@spark.apache.org" Subject: Datasource API V2 and

unsubscribe

2018-04-27 Thread Deepesh Maheshwari

unsubscribe

2018-04-27 Thread hari haran
-- - Hariharan M K

Re: Sorting on a streaming dataframe

2018-04-27 Thread Hemant Bhanawat
I see. monotonically_increasing_id on streaming dataFrames will be really helpful to me and I believe to many more users. Adding this functionality in Spark would be efficient in terms of performance as compared to implementing this functionality inside the applications. Hemant On Thu, Apr 26,

Re: saveAsNewAPIHadoopDataset must not enable speculation for parquet file?

2018-04-27 Thread cane
Thanks Steve! I will study about links you mentioned! -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org