Re: how to add new column using regular expression within pyspark dataframe

2017-04-19 Thread Yan Facai
How about using `withColumn` and UDF? example: + https://gist.github.com/zoltanctoth/2deccd69e3d1cde1dd78 + https://ragrawal.wordpress.com/2015/10/02/spark-custom-udf-example/ On Mon, Apr 17, 2017 at 8:25 PM, Zeming Yu

JDBC write error of Pyspark dataframe

2017-04-19 Thread Cinyoung Hur
Hi, I'm trying to write dataframe to MariaDB. I got this error message, but I have no clue. Please give some advice. Py4JJavaErrorTraceback (most recent call last) in ()> 1 result1.filter(result1["gnl_nm_set"] == "").count() /usr/local/linewalks/spark/spark/python/pyspark/sql/dataframe.pyc

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
works now! thanks much! On Wed, Apr 19, 2017 at 2:05 PM, kant kodali wrote: > oops my bad. I see it now! sorry. > > On Wed, Apr 19, 2017 at 1:56 PM, Marcelo Vanzin > wrote: > >> I see a bunch of getOrCreate methods in that class. They were all >> added

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
oops my bad. I see it now! sorry. On Wed, Apr 19, 2017 at 1:56 PM, Marcelo Vanzin wrote: > I see a bunch of getOrCreate methods in that class. They were all > added in SPARK-6752, a long time ago. > > On Wed, Apr 19, 2017 at 1:51 PM, kant kodali wrote:

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread Marcelo Vanzin
I see a bunch of getOrCreate methods in that class. They were all added in SPARK-6752, a long time ago. On Wed, Apr 19, 2017 at 1:51 PM, kant kodali wrote: > There is no getOrCreate for JavaStreamingContext however I do use > JavaStreamingContext inside

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
There is no *getOrCreate *for JavaStreamingContext however I do use JavaStreamingContext inside createStreamingContext() from my code in the previous email. On Wed, Apr 19, 2017 at 1:46 PM, Marcelo Vanzin wrote: > Why are you not using JavaStreamingContext if you're writing

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread Marcelo Vanzin
Why are you not using JavaStreamingContext if you're writing Java? On Wed, Apr 19, 2017 at 1:42 PM, kant kodali wrote: > Hi All, > > I get the following errors whichever way I try either lambda or generics. I > am using > spark 2.1 and scalla 2.11.8 > > > StreamingContext ssc

Problem with Java and Scala interoperability // streaming

2017-04-19 Thread kant kodali
Hi All, I get the following errors whichever way I try either lambda or generics. I am using spark 2.1 and scalla 2.11.8 StreamingContext ssc = StreamingContext.getOrCreate(hdfsCheckpointDir, () -> {return createStreamingContext();}, null, false); ERROR StreamingContext ssc =

Re: java.lang.java.lang.UnsupportedOperationException

2017-04-19 Thread Nicholas Hakobian
CDH 5.5 only provides Spark 1.5. Are you managing your pySpark install separately? For something like your example, you will get significantly better performance using coalesce with a lit, like so: from pyspark.sql.functions import lit, coalesce def replace_empty(icol): return

Re: Handling skewed data

2017-04-19 Thread Richard Siebeling
I'm also interested in this, does anyone this? On 17 April 2017 at 17:17, Vishnu Viswanath wrote: > Hello All, > > Does anyone know if the skew handling code mentioned in this talk > https://www.youtube.com/watch?v=bhYV0JOPd9Y was added to spark? > > If so can I

Re: java.lang.java.lang.UnsupportedOperationException

2017-04-19 Thread issues solution
Pyspark 1.6 On cloudera 5.5 (yearn) 2017-04-19 13:42 GMT+02:00 issues solution : > Hi , > somone can tell me why i get the folowing error with udf apply like udf > > def replaceCempty(x): > if x is None : > return "" > else : > return

Real time incremental Update to Spark Graphs.

2017-04-19 Thread Siddharth Ubale
Hi, We have a scenario where we want to build graph and perform analytics on it. However , the graph once built and cached in spark memory, allows analytics on the already present graph. We would like the graph to be incremented in real time as and when we receive new edges and vertices for

java.lang.java.lang.UnsupportedOperationException

2017-04-19 Thread issues solution
Hi , somone can tell me why i get the folowing error with udf apply like udf def replaceCempty(x): if x is None : return "" else : return x.encode('utf-8') udf_replaceCempty = F.udf(replaceCempty,StringType()) dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i