[Structured Streaming]: Structured Streaming into Redshift sink

2018-01-18 Thread Somasundaram Sekar
Is it possible to write the Dataframe backed by Kafka Streaming source into AWS Redshift, we have in the past used https://github.com/databricks/spark-redshift to write into redshift, but I presume it will not work with *writeStream*. Also writing with JDBC connector with ForeachWriter is also may

Writing to Redshift from Kafka Streaming source

2018-01-18 Thread Somasundaram Sekar
Hi, Is it possible to write the Dataframe backed by Kafka Streaming source into AWS Redshift, we have in the past used https://github.com/databricks/spark-redshift to write into redshift, but I presume it will not work with DataFrame##writeStream(). Also writing with JDBC connector with

Re: learning Spark

2017-12-04 Thread Somasundaram Sekar
Learning Spark - ORielly publication as a starter and official doc On 4 Dec 2017 9:19 am, "Manuel Sopena Ballesteros" wrote: > Dear Spark community, > > > > Is there any resource (books, online course, etc.) available that you know > of to learn about spark? I am

Equivalent of Redshift ListAgg function in Spark (Pyspak)

2017-10-08 Thread Somasundaram Sekar
Hi, I want to concat multiple columns into a single column after grouping the DataFrame, I want an functional equivalent of Redshift ListAgg function pg_catalog.Listagg(column, '|') within GROUP( ORDER BY column) AS name LISTAGG Function : For each group in a query, the

DataFrame multiple agg on the same column

2017-10-07 Thread Somasundaram Sekar
Hi, I have a GroupedData object, on which I perform aggregation of few columns since GroupedData takes in map, I cannot perform multiple aggregate on the same column, say I want to have both max and min of amount. So the below line of code will return only one aggregate per column

Re: Splitting columns from a text file

2016-09-05 Thread Somasundaram Sekar
, you will get RDD of arrays. > What is your expected outcome of 2nd map? > > On Mon, Sep 5, 2016 at 11:30 PM, Ashok Kumar <ashok34...@yahoo.com.invalid > > wrote: > > Thank you sir. > > This is what I get > > scala> textFile.map(x=> x.split(",")) &g

Re: Splitting columns from a text file

2016-09-05 Thread Somasundaram Sekar
; (x.getString(0)) > | ) > :27: error: value getString is not a member of Array[String] >textFile.map(x=> x.split(",")).map(x => (x.getString(0)) > > regards > > > > > On Monday, 5 September 2016, 13:51, Somasundaram Sekar <somasundar.se

Re: Splitting columns from a text file

2016-09-05 Thread Somasundaram Sekar
Basic error, you get back an RDD on transformations like map. sc.textFile("filename").map(x => x.split(",") On 5 Sep 2016 6:19 pm, "Ashok Kumar" wrote: > Hi, > > I have a text file as below that I read in > > 74,20160905-133143,98.11218069128827594148 >

Resources for learning Spark administration

2016-09-04 Thread Somasundaram Sekar
Please suggest some good resources to learn Spark administration.

Re: Spark transformations

2016-09-04 Thread Somasundaram Sekar
Can you try this https://www.linkedin.com/pulse/hive-functions-udfudaf-udtf-examples-gaurav-singh On 4 Sep 2016 9:38 pm, "janardhan shetty" wrote: > Hi, > > Is there any chance that we can send entire multiple columns to an udf and > generate a new column for Spark ML.

Re: Importing large file with SparkContext.textFile

2016-09-03 Thread Somasundaram Sekar
s splittable say TSV, CSV etc, it will be distributed across all executors. On Sat, Sep 3, 2016 at 3:38 PM, Somasundaram Sekar <somasundar.sekar@ tigeranalytics.com> wrote: > Hi All, > > > > Would like to gain some understanding on the questions listed below, > >

Importing large file with SparkContext.textFile

2016-09-03 Thread Somasundaram Sekar
Hi All, Would like to gain some understanding on the questions listed below, 1. When processing a large file with Apache Spark, with, say, sc.textFile("somefile.xml"), does it split it for parallel processing across executors or, will it be processed as a single chunk in a single