Re: build spark source code

2017-11-22 Thread Jörn Franke
You can check if Apache Bigtop provided you something like this for Spark on Windows (well probably not based on sbt but mvn). > On 23. Nov 2017, at 03:34, Michael Artz wrote: > > It would be nice if I could download the source code of spark from github, > then build

Re: Custom Data Source for getting data from Rest based services

2017-11-22 Thread sathich
Hi Sourav, This is quite an useful addition to the spark family, this is a usecase that comes more often than talked about. * to get a 3rd party mapping data(geo coordinates) , * access database data through rest. * download data from from bulk data api service It will be really useful to

Re: does "Deep Learning Pipelines" scale out linearly?

2017-11-22 Thread Nick Pentreath
For that package specifically it’s best to see if they have a mailing list and if not perhaps ask on github issues. Having said that perhaps the folks involved in that package will reply here too. On Wed, 22 Nov 2017 at 20:03, Andy Davidson wrote: > I am starting

SparkSQL not support CharType

2017-11-22 Thread 163
Hi, when I use Dataframe with table schema, It goes wrong: val test_schema = StructType(Array( StructField("id", IntegerType, false), StructField("flag", CharType(1), false), StructField("time", DateType, false))); val df = spark.read.format("com.databricks.spark.csv")

build spark source code

2017-11-22 Thread Michael Artz
It would be nice if I could download the source code of spark from github, then build it with sbt on my windows machine, and use IntelliJ to make little modifications to the code base. I have installed spark before on windows quite a few times, but I just use the packaged artifact. Has anyone

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-22 Thread Nicolas Paris
Hey Finally I improved a lot the spark-hive sql performances. I had some problem with some topology_script.py that made huge log error trace and reduced spark performances in python mode. I just corrected the python2 scripts to be python3 ready. I had some problem with broadcast variable while

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread Georg Heiler
Did you check that the security extensions are installed (JCE)? KhajaAsmath Mohammed schrieb am Mi., 22. Nov. 2017 um 19:36 Uhr: > [image: Inline image 1] > > This is what we are on. > > On Wed, Nov 22, 2017 at 12:33 PM, KhajaAsmath Mohammed < > mdkhajaasm...@gmail.com>

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
[image: Inline image 1] This is what we are on. On Wed, Nov 22, 2017 at 12:33 PM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > We use oracle JDK. we are on unix. > > On Wed, Nov 22, 2017 at 12:31 PM, Georg Heiler > wrote: > >> Do you use oracle or open

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
We use oracle JDK. we are on unix. On Wed, Nov 22, 2017 at 12:31 PM, Georg Heiler wrote: > Do you use oracle or open jdk? We recently had an issue with open jdk: > formerly, java Security extensions were installed by default - no longer so > on centos 7.3 > > Are

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread Georg Heiler
Do you use oracle or open jdk? We recently had an issue with open jdk: formerly, java Security extensions were installed by default - no longer so on centos 7.3 Are these installed? KhajaAsmath Mohammed schrieb am Mi. 22. Nov. 2017 um 19:29: > I passed keytab, renewal

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
I passed keytab, renewal is enabled by running the script every eight hours. User gets renewed by the script every eight hours. On Wed, Nov 22, 2017 at 12:27 PM, Georg Heiler wrote: > Did you pass a keytab? Is renewal enabled in your kdc? > KhajaAsmath Mohammed

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread Georg Heiler
Did you pass a keytab? Is renewal enabled in your kdc? KhajaAsmath Mohammed schrieb am Mi. 22. Nov. 2017 um 19:25: > Hi, > > I have written spark stream job and job is running successfully for more > than 36 hours. After around 36 hours job gets failed with kerberos

Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
Hi, I have written spark stream job and job is running successfully for more than 36 hours. After around 36 hours job gets failed with kerberos issue. Any solution on how to resolve it. org.apache.spark.SparkException: Task failed while wri\ ting rows. at

newbie: how to partition data on file system. What are best practices?

2017-11-22 Thread Andy Davidson
I am working on a deep learning project. Currently we do everything on a single machine. I am trying to figure out how we might be able to move to a clustered spark environment. Clearly its possible a machine or job on the cluster might fail so I assume that the data needs to be replicated to

Re: Writing files to s3 with out temporary directory

2017-11-22 Thread Haoyuan Li
This blog / tutorial maybe helpful to run Spark in the Cloud with Alluxio. Best regards, Haoyuan On Mon, Nov 20, 2017 at 2:12 PM, lucas.g...@gmail.com wrote: > That sounds like allot of work and if I

does "Deep Learning Pipelines" scale out linearly?

2017-11-22 Thread Andy Davidson
I am starting a new deep learning project currently we do all of our work on a single machine using a combination of Keras and Tensor flow. https://databricks.github.io/spark-deep-learning/site/index.html looks very promising. Any idea how performance is likely to improve as I add machines to my

Spark Stremaing Hive Dynamic Partitions Issue

2017-11-22 Thread KhajaAsmath Mohammed
Hi, I am able to wirte data into hive tables from spark stremaing. Job ran successfully for 37 hours and I started getting errors in task failure as below. Hive table has data too untill tasks are failed. Job aborted due to stage failure: Task 0 in stage 691.0 failed 4 times, most recent

Re: Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded

2017-11-22 Thread Vadim Semenov
The error message seems self-explanatory, try to figure out what's the disk quota you have for your user. On Wed, Nov 22, 2017 at 8:23 AM, Chetan Khatri wrote: > Anybody reply on this ? > > On Tue, Nov 21, 2017 at 3:36 PM, Chetan Khatri < >

Re: Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded

2017-11-22 Thread Chetan Khatri
Anybody reply on this ? On Tue, Nov 21, 2017 at 3:36 PM, Chetan Khatri wrote: > > Hello Spark Users, > > I am getting below error, when i am trying to write dataset to parquet > location. I have enough disk space available. Last time i was facing same > kind of