date:20160511

Re: [build system] short downtime next thursday morning, 5-12-16 @ 8am PDT

2016-05-11 Thread shane knapp

reminder: this is happening tomorrow morning! 7am PDT: builds paused 8am PDT: master reboot, upgrade happens 9am PDT: builds restarted On Mon, May 9, 2016 at 4:17 PM, shane knapp wrote: > reminder: this is happening thursday morning. > > On Wed, May 4, 2016 at 11:38

Re: Adding HDFS read-time metrics per task (RE: SPARK-1683)

2016-05-11 Thread Reynold Xin

Adding Kay On Wed, May 11, 2016 at 12:01 PM, Brian Cho wrote: > Hi, > > I'm interested in adding read-time (from HDFS) to Task Metrics. The > motivation is to help debug performance issues. After some digging, its > briefly mentioned in SPARK-1683 that this feature didn't

Shrinking the DataFrame lineage

2016-05-11 Thread Ulanov, Alexander

Dear Spark developers, Recently, I was trying to switch my code from RDDs to DataFrames in order to compare the performance. The code computes RDD in a loop. I use RDD.persist followed by RDD.count to force Spark compute the RDD and cache it, so that it does not need to re-compute it on each

Re: dataframe udf functioin will be executed twice when filter on new column created by withColumn

2016-05-11 Thread James Hammerton

This may be related to: https://issues.apache.org/jira/browse/SPARK-13773 Regards, James On 11 May 2016 at 15:49, Ted Yu wrote: > In master branch, behavior is the same. > > Suggest opening a JIRA if you haven't done so. > > On Wed, May 11, 2016 at 6:55 AM, Tony Jin

Re: dataframe udf functioin will be executed twice when filter on new column created by withColumn

2016-05-11 Thread Ted Yu

In master branch, behavior is the same. Suggest opening a JIRA if you haven't done so. On Wed, May 11, 2016 at 6:55 AM, Tony Jin wrote: > Hi guys, > > I have a problem about spark DataFrame. My spark version is 1.6.1. > Basically, i used udf and df.withColumn to create a

dataframe udf functioin will be executed twice when filter on new column created by withColumn

2016-05-11 Thread Tony Jin

Hi guys, I have a problem about spark DataFrame. My spark version is 1.6.1. Basically, i used udf and df.withColumn to create a "new" column, and then i filter the values on this new columns and call show(action). I see the udf function (which is used to by withColumn to create the new column) is

Re: Structured Streaming with Kafka source/sink

2016-05-11 Thread Ted Yu

Please see this thread: http://search-hadoop.com/m/q3RTt9XAz651PiG/Adhoc+queries+spark+streaming=Re+Adhoc+queries+on+Spark+2+0+with+Structured+Streaming > On May 11, 2016, at 1:47 AM, Ofir Manor wrote: > > Hi, > I'm trying out Structured Streaming from current 2.0

Structured Streaming with Kafka source/sink

2016-05-11 Thread Ofir Manor

Hi, I'm trying out Structured Streaming from current 2.0 branch. Does the branch currently support Kafka as either source or sink? I couldn't find a specific JIRA or design doc for that in SPARK-8360 or in the examples... Is it still targeted for 2.0? Also, I naively assume it will look similar

Re: [build system] short downtime next thursday morning, 5-12-16 @ 8am PDT

Re: Adding HDFS read-time metrics per task (RE: SPARK-1683)

Shrinking the DataFrame lineage

Re: dataframe udf functioin will be executed twice when filter on new column created by withColumn

Re: dataframe udf functioin will be executed twice when filter on new column created by withColumn

dataframe udf functioin will be executed twice when filter on new column created by withColumn

Re: Structured Streaming with Kafka source/sink

Structured Streaming with Kafka source/sink

8 matches

Site Navigation

Mail list logo

Footer information