But wouldn’t partitioning column partition the data only in Spark RDD? Would it
also partition columns at disk when data is written (diving data in folders)?
From: ayan guha <guha.a...@gmail.com<mailto:guha.a...@gmail.com>>
Date: Friday, July 21, 2017 at 3:25 PM
To: "J
Is it possible to have Spark Data Frame Writer write based on RangePartioning?
For Ex -
I have 10 distinct values for column_a, say 1 to 10.
df.write
.partitionBy("column_a")
Above code by default will create 10 folders .. column_a=1,column_a=2
...column_a=10
I want to see if it is possible
That helped, thanks TD! :D
From: Tathagata Das
<tathagata.das1...@gmail.com<mailto:tathagata.das1...@gmail.com>>
Date: Tuesday, June 6, 2017 at 3:26 AM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Cc: "user@spark.apache.or
I have a very simple spark streaming job running locally in standalone mode.
There is a customer receiver which reads from database and pass it to the main
job which prints the total. Not an actual use case but I am playing around to
learn. Problem is that job gets stuck forever, logic is very
Any one else facing this issue?
I pulled 0.7.0 release from http://ranger.apache.org/download.html
Built it and found that was not conf folder:
ranger-0.7.0-admin/ews/webapp/WEB-INF/classes -> ls
META-INF db_message_bundle.properties org resourcenamemap.properties
conf.dist log4jdbc.properties
Found it. Some how my host mapping was messing it up. Changing it to point to
localhost worked.:
/etc/host
#127.0.0.1 XX.com
127.0.0.1 localhost
From: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Date: Monday, December 19, 201
Hi,
I am using pre-built 'spark-2.0.1-bin-hadoop2.7’ and when I try to start
pyspark, I get following message.
Any ideas what could be wrong? I tried using python3, setting SPARK_LOCAL_IP to
127.0.0.1 but same error.
~ -> cd /Applications/spark-2.0.1-bin-hadoop2.7/bin/
When I read a specific file it works:
val filePath= "s3n://bucket_name/f1/f2/avro/dt=2016-10-19/hr=19/00"
val df = spark.read.avro(filePath)
But if I point to a folder to read date partitioned data it fails:
val filePath="s3n://bucket_name/f1/f2/avro/dt=2016-10-19/"
I get this error:
Awesome, thanks Silvio!
From: Silvio Fiorito
<silvio.fior...@granturing.com<mailto:silvio.fior...@granturing.com>>
Date: Thursday, November 3, 2016 at 12:26 PM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>,
Denny Lee <
: Denny Lee <denny.g@gmail.com<mailto:denny.g@gmail.com>>
Date: Thursday, November 3, 2016 at 10:59 AM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>,
"user@spark.apache.org<mailto:user@spark.apache.org>"
<user@sp
I have a lookup table in HANA database. I want to create a spark broadcast
variable for it.
What would be the suggested approach? Should I read it as an data frame and
convert data frame into broadcast variable?
Thanks,
Nishit
lto:ko...@tresata.com>>
Date: Thursday, October 27, 2016 at 12:49 PM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
<user@spark.apache.org<mailto:user@spark.apa
Do you mind sharing why should escaping not work without quotes?
From: Koert Kuipers <ko...@tresata.com<mailto:ko...@tresata.com>>
Date: Thursday, October 27, 2016 at 12:40 PM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Cc:
Interesting finding: Escaping works if data is quoted but not otherwise.
From: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Date: Thursday, October 27, 2016 at 10:54 AM
To: "user@spark.apache.org<mailto:user@spark.apache.org>&quo
I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to read
a csv file which has \ escapes.
val myDA = spark.read
.option("quote",null)
.schema(mySchema)
.csv(filePath)
As per documentation \ is default escape for csv reader. But it does not work.
Spark is
15 matches
Mail list logo