date:20170622

Re: A question about rdd transformation

2017-06-22 Thread Lionel Luffy

Now I found the root cause is a Wrapper class in AnyRef is not Serializable, but even though I changed it to implements Serializable. the 'rows' still cannot get data... Any suggestion? On Fri, Jun 23, 2017 at 10:56 AM, Lionel Luffy wrote: > Hi there, > I'm trying to do

A question about rdd transformation

2017-06-22 Thread Lionel Luffy

Hi there, I'm trying to do below action while it always return java.io.NotSerializableException in the shuffle task. I've checked that Array is serializable. how can I get the data of rdd in newRDD? step 1: val rdd: RDD[(AnyRef, Array[AnyRef]] {..} step2 : rdd

Re: Using Spark with Local File System/NFS

2017-06-22 Thread Michael Mior

If you put a * in the path, Spark will look for a file or directory named *. To read all the files in a directory, just remove the star. -- Michael Mior michael.m...@gmail.com On Jun 22, 2017 17:21, "saatvikshah1994" wrote: > Hi, > > I've downloaded and kept the same

Using Spark with Local File System/NFS

2017-06-22 Thread saatvikshah1994

Hi, I've downloaded and kept the same set of data files on all my cluster nodes, in the same absolute path - say /home/xyzuser/data/*. I am now trying to perform an operation(say open(filename).read()) on all these files in spark, but by passing local file paths. I was under the assumption that

Spark submit - org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run error

2017-06-22 Thread Kanagha Kumar

Hi, I am *intermittently* seeing this error while doing spark-submit for spark 2.0.2-scala 2.11 version. I see the same issue reported in https://issues.apache.org/jira/browse/SPARK-18343 and it seems to be RESOLVED. I can run successfully most of the time though. Hence I'm unsure if it is

Re: Exception which using ReduceByKeyAndWindow in Spark Streaming.

2017-06-22 Thread Tathagata Das

Unfortunately, I am out of ideas. I dont know whats going wrong. If you can, try using Structured Streaming. We are more active on the Structured streaming project. On Thu, Jun 22, 2017 at 4:07 PM, swetha kasireddy wrote: > Hi TD, > > I am still seeing this issue

Re: Exception which using ReduceByKeyAndWindow in Spark Streaming.

2017-06-22 Thread swetha kasireddy

Hi TD, I am still seeing this issue with any immuatble DataStructure. Any idea why this happens? I use scala.collection.immutable.List[String]) and my reduce and inverse reduce does the following. visitorSet1 ++visitorSet2 visitorSet1.filterNot(visitorSet2.contains(_) On Wed, Jun 7, 2017

Re: Merging multiple Pandas dataframes

2017-06-22 Thread Saatvik Shah

Hi Assaf, Thanks for your suggestion. I also found one other improvement which is to iteratively convert Pandas DFs to RDDs and take a union of those(similar to dataframes). Basically calling createDataFrame is heavy + checkpointing of DataFrames is a brand new feature. Instead create a huge

Re: Trouble with PySpark UDFs and SPARK_HOME only on EMR

2017-06-22 Thread Nicholas Chammas

Here’s a repro for a very similar issue where Spark hangs on the UDF, which I think is related to the SPARK_HOME issue. I posted the repro on the EMR forum , but in case you can’t access it: 1. I’m running EMR 5.6.0, Spark 2.1.1, and

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-22 Thread N B

This issue got resolved. I was able to trace it to the fact that the driver program's pom.xml was pulling in Spark 2.1.1 which in turn was pulling in Hadoop 2.2.0. Explicitly adding dependencies on Hadoop libraries 2.7.3 resolves it. The following API in HDFS :

Unsubscribe

2017-06-22 Thread LisTree Team

LisTree Team - Big Data Training Team - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Unsubscribe

2017-06-22 Thread LisTree Team

LisTree Team - Big Data Training Team - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Trouble with PySpark UDFs and SPARK_HOME only on EMR

2017-06-22 Thread Nick Chammas

I’m seeing a strange issue on EMR which I posted about here . In brief, when I try to import a UDF I’ve defined, Python somehow fails to find Spark. This exact code works for me locally and works on our on-premises CDH

How does Spark deal with Data Skewness?

2017-06-22 Thread Sea aj

Hi everyone, I have read about some interesting ideas on how to manage skew but I was not sure if any of these techniques are being used in Spark 2.x versions or not? To name a few, "Salting the Data" and "Dynamic Repartitioning" are techniques introduced in Spark Summits. I am really curious to

Re: Re: Re: spark2.1 kafka0.10

2017-06-22 Thread lk_spark

thank you Kumar , I will try it later. 2017-06-22 lk_spark 发件人：Pralabh Kumar 发送时间：2017-06-22 20:20 主题：Re: Re: spark2.1 kafka0.10 收件人："lk_spark" 抄送："user.spark" It looks like your replicas for partition are getting failed. If

Re: Why my project has this kind of error ?

2017-06-22 Thread satish lalam

Minglei - You could check your jdk path and scala library setting in project structure. i.e., project view (alt + 1), and then pressing F4 to open Project structure... look under SDKs and Libraries. On Mon, Jun 19, 2017 at 10:54 PM, 张明磊 wrote: > Hello to all, > > Below

Re: Re: spark2.1 kafka0.10

2017-06-22 Thread Pralabh Kumar

It looks like your replicas for partition are getting failed. If u have more brokers , can u try increasing ,replicas ,just to make sure atleast one leader is always available. On Thu, Jun 22, 2017 at 10:34 AM, lk_spark wrote: > each topic have 5 partition , 2 replicas . > >

Re: Broadcasts & Storage Memory

2017-06-22 Thread Pralabh Kumar

Hi Broadcast variables definitely store in the spark.memory.storageFraction . 1 If we go into the code of TorrentBroadcast.scala and writeBlocks method and navigates to BlockManager to MemoryStore . Desearlization of the variables occures in unroll memory and then transferred to storage memory .

Re: A question about rdd transformation

A question about rdd transformation

Re: Using Spark with Local File System/NFS

Using Spark with Local File System/NFS

Spark submit - org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run error

Re: Exception which using ReduceByKeyAndWindow in Spark Streaming.

Re: Exception which using ReduceByKeyAndWindow in Spark Streaming.

Re: Merging multiple Pandas dataframes

Re: Trouble with PySpark UDFs and SPARK_HOME only on EMR

Re: Flume DStream produces 0 records after HDFS node killed

Unsubscribe

Unsubscribe

Trouble with PySpark UDFs and SPARK_HOME only on EMR

How does Spark deal with Data Skewness?

Re: Re: Re: spark2.1 kafka0.10

Re: Why my project has this kind of error ?

Re: Re: spark2.1 kafka0.10

Re: Broadcasts & Storage Memory

18 matches

Site Navigation

Mail list logo

Footer information