Re: How to access line fileName in loading file using the textFile method

2018-09-26 Thread vermanurag
Spark has sc.wholeTextFiles() which returns RDD of tuple. First element of tuple if the file name and second element is the file content. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe

Re: How to run spark shell using YARN

2018-03-12 Thread vermanurag
This does not look like Spark error. Looks like yarn has not been able to allocate resources for spark driver. If you check resource manager UI you are likely to see this as spark application waiting for resources. Try reducing the driver node memory and/ or other bottlenecks based on what you see

Re: how "hour" function in Spark SQL is supposed to work?

2018-03-11 Thread vermanurag
Not sure why you are dividing by 1000. from_unixtime expects a long type which is time in milliseconds from reference date. The following should work: val ds = dataset.withColumn("hour",hour(from_unixtime(dataset.col("ts" -- Sent from:

Re: Serialize a DataFrame with Vector values into text/csv file

2018-02-21 Thread vermanurag
Try to_json on the vector column. That should do it. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Serialize a DataFrame with Vector values into text/csv file

2018-02-20 Thread vermanurag
If your dataframe has columns types like vector then you cannot save as csv/ text as there are no direct equivalent supported by flat formats like csv/ text. You may need to convert the column type appropriately (eg. convert the incompatible column to StringType before saving the output as csv.

Re: Spark Structured Streaming for Twitter Streaming data

2018-01-31 Thread vermanurag
Twitter functionality is not part of Core Spark. We have successfully used the following packages from maven central in past org.apache.bahir:spark-streaming-twitter_2.11:2.2.0 Earlier there used to be a twitter package under spark, but I find that it has not been updated beyond Spark 1.6

Re: How to hold some data in memory while processing rows in a DataFrame?

2018-01-22 Thread vermanurag
Looking at description of problem window functions may solve your issue. It allows operation over a window that can include records before/ after the particular record -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/