Re: Spark Data Frame Writer - Range Partiotioning

2017-07-25 Thread Jain, Nishit
But wouldn’t partitioning column partition the data only in Spark RDD? Would it also partition columns at disk when data is written (diving data in folders)? From: ayan guha <guha.a...@gmail.com<mailto:guha.a...@gmail.com>> Date: Friday, July 21, 2017 at 3:25 PM To: "J

Spark Data Frame Writer - Range Partiotioning

2017-07-21 Thread Jain, Nishit
Is it possible to have Spark Data Frame Writer write based on RangePartioning? For Ex - I have 10 distinct values for column_a, say 1 to 10. df.write .partitionBy("column_a") Above code by default will create 10 folders .. column_a=1,column_a=2 ...column_a=10 I want to see if it is possible

Re: Spark Streaming Job Stuck

2017-06-06 Thread Jain, Nishit
That helped, thanks TD! :D From: Tathagata Das <tathagata.das1...@gmail.com<mailto:tathagata.das1...@gmail.com>> Date: Tuesday, June 6, 2017 at 3:26 AM To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>> Cc: "user@spark.apache.or

Spark Streaming Job Stuck

2017-06-05 Thread Jain, Nishit
I have a very simple spark streaming job running locally in standalone mode. There is a customer receiver which reads from database and pass it to the main job which prints the total. Not an actual use case but I am playing around to learn. Problem is that job gets stuck forever, logic is very

conf dir missing

2017-05-19 Thread Jain, Nishit
Any one else facing this issue? I pulled 0.7.0 release from http://ranger.apache.org/download.html Built it and found that was not conf folder: ranger-0.7.0-admin/ews/webapp/WEB-INF/classes -> ls META-INF db_message_bundle.properties org resourcenamemap.properties conf.dist log4jdbc.properties

Re: PySpark: [Errno 8] nodename nor servname provided, or not known

2016-12-19 Thread Jain, Nishit
Found it. Some how my host mapping was messing it up. Changing it to point to localhost worked.: /etc/host #127.0.0.1 XX.com 127.0.0.1 localhost From: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>> Date: Monday, December 19, 201

PySpark: [Errno 8] nodename nor servname provided, or not known

2016-12-19 Thread Jain, Nishit
Hi, I am using pre-built 'spark-2.0.1-bin-hadoop2.7’ and when I try to start pyspark, I get following message. Any ideas what could be wrong? I tried using python3, setting SPARK_LOCAL_IP to 127.0.0.1 but same error. ~ -> cd /Applications/spark-2.0.1-bin-hadoop2.7/bin/

Spark AVRO S3 read not working for partitioned data

2016-11-17 Thread Jain, Nishit
When I read a specific file it works: val filePath= "s3n://bucket_name/f1/f2/avro/dt=2016-10-19/hr=19/00" val df = spark.read.avro(filePath) But if I point to a folder to read date partitioned data it fails: val filePath="s3n://bucket_name/f1/f2/avro/dt=2016-10-19/" I get this error:

Re: How do I convert a data frame to broadcast variable?

2016-11-04 Thread Jain, Nishit
Awesome, thanks Silvio! From: Silvio Fiorito <silvio.fior...@granturing.com<mailto:silvio.fior...@granturing.com>> Date: Thursday, November 3, 2016 at 12:26 PM To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>, Denny Lee <

Re: How do I convert a data frame to broadcast variable?

2016-11-03 Thread Jain, Nishit
: Denny Lee <denny.g@gmail.com<mailto:denny.g@gmail.com>> Date: Thursday, November 3, 2016 at 10:59 AM To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" <user@sp

How do I convert a data frame to broadcast variable?

2016-11-03 Thread Jain, Nishit
I have a lookup table in HANA database. I want to create a spark broadcast variable for it. What would be the suggested approach? Should I read it as an data frame and convert data frame into broadcast variable? Thanks, Nishit

Re: CSV escaping not working

2016-10-27 Thread Jain, Nishit
lto:ko...@tresata.com>> Date: Thursday, October 27, 2016 at 12:49 PM To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apa

Re: CSV escaping not working

2016-10-27 Thread Jain, Nishit
Do you mind sharing why should escaping not work without quotes? From: Koert Kuipers <ko...@tresata.com<mailto:ko...@tresata.com>> Date: Thursday, October 27, 2016 at 12:40 PM To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>> Cc:

Re: CSV escaping not working

2016-10-27 Thread Jain, Nishit
Interesting finding: Escaping works if data is quoted but not otherwise. From: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>> Date: Thursday, October 27, 2016 at 10:54 AM To: "user@spark.apache.org<mailto:user@spark.apache.org>&quo

CSV escaping not working

2016-10-27 Thread Jain, Nishit
I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to read a csv file which has \ escapes. val myDA = spark.read .option("quote",null) .schema(mySchema) .csv(filePath) As per documentation \ is default escape for csv reader. But it does not work. Spark is