But wouldn’t partitioning column partition the data only in Spark RDD? Would it
also partition columns at disk when data is written (diving data in folders)?
From: ayan guha <guha.a...@gmail.com<mailto:guha.a...@gmail.com>>
Date: Friday, July 21, 2017 at 3:25 PM
To: "J
Is it possible to have Spark Data Frame Writer write based on RangePartioning?
For Ex -
I have 10 distinct values for column_a, say 1 to 10.
df.write
.partitionBy("column_a")
Above code by default will create 10 folders .. column_a=1,column_a=2
...column_a=10
I want to see if it is possible
That helped, thanks TD! :D
From: Tathagata Das
<tathagata.das1...@gmail.com<mailto:tathagata.das1...@gmail.com>>
Date: Tuesday, June 6, 2017 at 3:26 AM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Cc: "user@spark.apache.or
I have a very simple spark streaming job running locally in standalone mode.
There is a customer receiver which reads from database and pass it to the main
job which prints the total. Not an actual use case but I am playing around to
learn. Problem is that job gets stuck forever, logic is very
Found it. Some how my host mapping was messing it up. Changing it to point to
localhost worked.:
/etc/host
#127.0.0.1 XX.com
127.0.0.1 localhost
From: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Date: Monday, December 19, 201
Hi,
I am using pre-built 'spark-2.0.1-bin-hadoop2.7’ and when I try to start
pyspark, I get following message.
Any ideas what could be wrong? I tried using python3, setting SPARK_LOCAL_IP to
127.0.0.1 but same error.
~ -> cd /Applications/spark-2.0.1-bin-hadoop2.7/bin/
When I read a specific file it works:
val filePath= "s3n://bucket_name/f1/f2/avro/dt=2016-10-19/hr=19/00"
val df = spark.read.avro(filePath)
But if I point to a folder to read date partitioned data it fails:
val filePath="s3n://bucket_name/f1/f2/avro/dt=2016-10-19/"
I get this error:
Awesome, thanks Silvio!
From: Silvio Fiorito
<silvio.fior...@granturing.com<mailto:silvio.fior...@granturing.com>>
Date: Thursday, November 3, 2016 at 12:26 PM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>,
Denny Lee <
: Denny Lee <denny.g@gmail.com<mailto:denny.g@gmail.com>>
Date: Thursday, November 3, 2016 at 10:59 AM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>,
"user@spark.apache.org<mailto:user@spark.apache.org>"
<user@sp
I have a lookup table in HANA database. I want to create a spark broadcast
variable for it.
What would be the suggested approach? Should I read it as an data frame and
convert data frame into broadcast variable?
Thanks,
Nishit
lto:ko...@tresata.com>>
Date: Thursday, October 27, 2016 at 12:49 PM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
<user@spark.apache.org<mailto:user@spark.apa
Do you mind sharing why should escaping not work without quotes?
From: Koert Kuipers <ko...@tresata.com<mailto:ko...@tresata.com>>
Date: Thursday, October 27, 2016 at 12:40 PM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Cc:
Interesting finding: Escaping works if data is quoted but not otherwise.
From: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Date: Thursday, October 27, 2016 at 10:54 AM
To: "user@spark.apache.org<mailto:user@spark.apache.org>&quo
I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to read
a csv file which has \ escapes.
val myDA = spark.read
.option("quote",null)
.schema(mySchema)
.csv(filePath)
As per documentation \ is default escape for csv reader. But it does not work.
Spark is
14 matches
Mail list logo