from:"vermanurag"

Re: How to access line fileName in loading file using the textFile method

2018-09-26 Thread vermanurag

Spark has sc.wholeTextFiles() which returns RDD of tuple. First element of tuple if the file name and second element is the file content. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe

Re: How to run spark shell using YARN

2018-03-12 Thread vermanurag

This does not look like Spark error. Looks like yarn has not been able to allocate resources for spark driver. If you check resource manager UI you are likely to see this as spark application waiting for resources. Try reducing the driver node memory and/ or other bottlenecks based on what you see

Re: how "hour" function in Spark SQL is supposed to work?

2018-03-11 Thread vermanurag

Not sure why you are dividing by 1000. from_unixtime expects a long type which is time in milliseconds from reference date. The following should work: val ds = dataset.withColumn("hour",hour(from_unixtime(dataset.col("ts" -- Sent from:

Re: Serialize a DataFrame with Vector values into text/csv file

2018-02-21 Thread vermanurag

Try to_json on the vector column. That should do it. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Serialize a DataFrame with Vector values into text/csv file

2018-02-20 Thread vermanurag

If your dataframe has columns types like vector then you cannot save as csv/ text as there are no direct equivalent supported by flat formats like csv/ text. You may need to convert the column type appropriately (eg. convert the incompatible column to StringType before saving the output as csv.

Re: Spark Structured Streaming for Twitter Streaming data

2018-01-31 Thread vermanurag

Twitter functionality is not part of Core Spark. We have successfully used the following packages from maven central in past org.apache.bahir:spark-streaming-twitter_2.11:2.2.0 Earlier there used to be a twitter package under spark, but I find that it has not been updated beyond Spark 1.6

Re: How to hold some data in memory while processing rows in a DataFrame?

2018-01-22 Thread vermanurag

Looking at description of problem window functions may solve your issue. It allows operation over a window that can include records before/ after the particular record -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: How to access line fileName in loading file using the textFile method

Re: How to run spark shell using YARN

Re: how "hour" function in Spark SQL is supposed to work?

Re: Serialize a DataFrame with Vector values into text/csv file

Re: Serialize a DataFrame with Vector values into text/csv file

Re: Spark Structured Streaming for Twitter Streaming data

Re: How to hold some data in memory while processing rows in a DataFrame?

7 matches

Site Navigation

Mail list logo

Footer information