Re: SparkSQL issue: Spark 1.3.1 + hadoop 2.6 on CDH5.3 with parquet

2016-06-20 Thread Satya
Hello, We are also experiencing the same error. Can you please provide the steps that resolved the issue. Thanks Satya -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-issue-Spark-1-3-1-hadoop-2-6-on-CDH5-3-with-parquet-tp22808p27197.html Sent from

Re: FAILED_TO_UNCOMPRESS Error - Spark 1.3.1

2016-05-30 Thread Takeshi Yamamuro
ames for our use case where we are > getting this exception. > > The parameters used are listed below. Kindly suggest if we are missing > something. > > Version used is Spark 1.3.1 > > Jira is still showing this issue as Open > https://issues.apache.org/jira/browse/SPAR

FAILED_TO_UNCOMPRESS Error - Spark 1.3.1

2016-05-30 Thread Prashant Singh Thakur
Hi, We are trying to use Spark Data Frames for our use case where we are getting this exception. The parameters used are listed below. Kindly suggest if we are missing something. Version used is Spark 1.3.1 Jira is still showing this issue as Open https://issues.apache.org/jira/browse/SPARK

Re: Migrating Transformers from Spark 1.3.1 to 1.5.0

2016-02-15 Thread Cesar Flores
I found my problem. I was calling setParameterValue(defaultValue) more than one time in the hierarchy of my classes. Thanks! On Mon, Feb 15, 2016 at 6:34 PM, Cesar Flores <ces...@gmail.com> wrote: > > I have a set of transformers (each with specific parameters) in spark > 1.

Migrating Transformers from Spark 1.3.1 to 1.5.0

2016-02-15 Thread Cesar Flores
I have a set of transformers (each with specific parameters) in spark 1.3.1. I have two versions, one that works and one that does not: 1.- working version //featureprovidertransformer contains already a set of ml params class DemographicTransformer(override val uid: String) extends

Spark 1.3.1 - Does SparkConext in multi-threaded env requires SparkEnv.set(env) anymore

2015-12-10 Thread Nirav Patel
As subject says, do we still need to use static env in every thread that access sparkContext? I read some ref here. http://qnalist.com/questions/4956211/is-spark-context-in-local-mode-thread-safe -- [image: What's New with Xactly]

Re: Spark 1.3.1 - Does SparkConext in multi-threaded env requires SparkEnv.set(env) anymore

2015-12-10 Thread Josh Rosen
Nope, you shouldn't have to do that anymore. As of https://github.com/apache/spark/pull/2624, which is in Spark 1.2.0+, SparkEnv's thread-local stuff was removed and replaced by a simple global variable (since it was used in an *effectively* global way before (see my comments on that PR)). As a

Re: How to handle the UUID in Spark 1.3.1

2015-10-09 Thread Ted Yu
This is related: SPARK-10501 On Fri, Oct 9, 2015 at 7:28 AM, java8964 <java8...@hotmail.com> wrote: > Hi, Sparkers: > > In this case, I want to use Spark as an ETL engine to load the data from > Cassandra, and save it into HDFS. > > Here is the environment specified info

How to handle the UUID in Spark 1.3.1

2015-10-09 Thread java8964
Hi, Sparkers: In this case, I want to use Spark as an ETL engine to load the data from Cassandra, and save it into HDFS. Here is the environment specified information: Spark 1.3.1Cassandra 2.1HDFS/Hadoop 2.2 I am using the Cassandra Spark Connector 1.3.x, which I have no problem to query the C*

RE: How to handle the UUID in Spark 1.3.1

2015-10-09 Thread java8964
Thanks, Ted. Does this mean I am out of luck for now? If I use HiveContext, and cast the UUID as string, will it work? Yong Date: Fri, 9 Oct 2015 09:09:38 -0700 Subject: Re: How to handle the UUID in Spark 1.3.1 From: yuzhih...@gmail.com To: java8...@hotmail.com CC: user@spark.apache.org

Re: How to handle the UUID in Spark 1.3.1

2015-10-09 Thread Ted Yu
-- > Date: Fri, 9 Oct 2015 09:09:38 -0700 > Subject: Re: How to handle the UUID in Spark 1.3.1 > From: yuzhih...@gmail.com > To: java8...@hotmail.com > CC: user@spark.apache.org > > > This is related: > SPARK-10501 > > On Fri, Oct 9, 2015 at 7:28 AM, java8964 &l

Re: Spark 1.3.1 on Yarn not using all given capacity

2015-10-06 Thread Cesar Berezowski
3 cores* not 8 César. > Le 6 oct. 2015 à 19:08, Cesar Berezowski <ce...@adaltas.com> a écrit : > > I deployed hdp 2.3.1 and got spark 1.3.1, spark 1.4 is supposed to be > available as technical preview I think > > vendor’s forum ? you mean hortonworks' ? >

Spark 1.3.1 on Yarn not using all given capacity

2015-10-06 Thread czoo
Hi, This post might be a duplicate with updates from another one (by me), sorry in advance I have an HDP 2.3 cluster running Spark 1.3.1 on 6 nodes (edge + master + 4 workers) Each worker has 8 cores and 40G of RAM available in Yarn That makes a total of 160GB and 32 cores I'm running a job

Re: Spark 1.3.1 on Yarn not using all given capacity

2015-10-06 Thread Ted Yu
; > I have an HDP 2.3 cluster running Spark 1.3.1 on 6 nodes (edge + master + 4 > workers) > Each worker has 8 cores and 40G of RAM available in Yarn > > That makes a total of 160GB and 32 cores > > I'm running a job with the following parameters : > --master yarn-client > --n

Re: Spark 1.3.1 saveAsParquetFile hangs on app exit

2015-08-26 Thread cingram
spark-shell-hang-on-exit.tdump http://apache-spark-user-list.1001560.n3.nabble.com/file/n24461/spark-shell-hang-on-exit.tdump -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-1-saveAsParquetFile-hangs-on-app-exit-tp24460p24461.html Sent from

Spark 1.3.1 saveAsParquetFile hangs on app exit

2015-08-26 Thread cingram
I have a simple test that is hanging when using s3a with spark 1.3.1. Is there something I need to do to cleanup the S3A file system? The write to S3 appears to have worked but this job hangs in the spark-shell and using spark-submit. Any help would be greatly appreciated. TIA. import

Re: Spark 1.3.1 saveAsParquetFile hangs on app exit

2015-08-26 Thread Cheng Lian
Could you please show jstack result of the hanged process? Thanks! Cheng On 8/26/15 10:46 PM, cingram wrote: I have a simple test that is hanging when using s3a with spark 1.3.1. Is there something I need to do to cleanup the S3A file system? The write to S3 appears to have worked but this job

Running spark shell on mesos with zookeeper on spark 1.3.1

2015-08-24 Thread kohlisimranjit
I have setup up apache mesos using mesosphere on Cent OS 6 with Java 8.I have 3 slaves which total to 3 cores and 8 gb ram. I have set no firewalls. I am trying to run the following lines of code to test whether the setup is working: val data = 1 to 1 val distData = sc.parallelize(data)

ClassCastException when saving a DataFrame to parquet file (saveAsParquetFile, Spark 1.3.1) using Scala

2015-08-21 Thread Emma Boya Peng
Hi, I was trying to programmatically specify a schema and apply it to a RDD of Rows and save the resulting DataFrame as a parquet file. Here's what I did: 1. Created an RDD of Rows from RDD[Array[String]]: val gameId= Long.valueOf(line(0)) val accountType = Long.valueOf(line(1)) val

ClassCastException when saving a DataFrame to parquet file (saveAsParquetFile, Spark 1.3.1) using Scala

2015-08-21 Thread Emma Boya Peng
Hi, I was trying to programmatically specify a schema and apply it to a RDD of Rows and save the resulting DataFrame as a parquet file, but I got java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Long on the last step. Here's what I did: 1. Created an RDD of Rows from

Re: intellij14 compiling spark-1.3.1 got error: assertion failed: com.google.protobuf.InvalidProtocalBufferException

2015-08-09 Thread Ted Yu
Can you check if there is protobuf version other than 2.5.0 on the classpath ? Please show the complete stack trace. Cheers On Sun, Aug 9, 2015 at 9:41 AM, longda...@163.com longda...@163.com wrote: hi all, i compile spark-1.3.1 on linux use intellij14 and got error assertion failed

Re: intellij14 compiling spark-1.3.1 got error: assertion failed: com.google.protobuf.InvalidProtocalBufferException

2015-08-09 Thread longda...@163.com
the stack trace is below Error:scalac: while compiling: /home/xiaoju/data/spark-1.3.1/core/src/main/scala/org/apache/spark/SparkContext.scala during phase: typer library version: version 2.10.4 compiler version: version 2.10.4 reconstructed args: -nobootcp

intellij14 compiling spark-1.3.1 got error: assertion failed: com.google.protobuf.InvalidProtocalBufferException

2015-08-09 Thread longda...@163.com
hi all, i compile spark-1.3.1 on linux use intellij14 and got error assertion failed: com.google.protobuf.InvalidProtocalBufferException, how could i solve the problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/intellij14-compiling-spark-1-3-1-got

Re:Re: intellij14 compiling spark-1.3.1 got error: assertion failed: com.google.protobuf.InvalidProtocalBufferException

2015-08-09 Thread 龙淡
thank you for reply, i use sbt to complie spark, but there are both protobuf 2.4.1 and 2.5.0 in maven repository , and protobuf 2.5.0 in .ivy repository, the stack trace is below Error:scalac: while compiling: /home/xiaoju/data/spark-1.3.1/core/src/main/scala/org/apache/spark

databricks spark sql csv FAILFAST not failing, Spark 1.3.1 Java 7

2015-07-22 Thread Adam Pritchard
numerous invalid csv files. Any advice? spark 1.3.1 running on mapr vm 4.1.0 java 1.7 SparkConf conf = new SparkConf().setAppName(Dataframe testing); JavaSparkContext sc = new JavaSparkContext(conf); SQLContext sqlContext = new SQLContext(sc); HashMapString, String options = new HashMapString

Re: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme s3n:)

2015-07-22 Thread Eugene Morozov
Hi, I’m stuck with the same issue, but I see org.apache.hadoop.fs.s3native.NativeS3FileSystem in the hadoop-core:1.0.4 (that’s the current hadoop-client I use) and this far is transitive dependency that comes from spark itself. I’m using custom build of spark 1.3.1 with hadoop-client 1.0.4

Re: Spark 1.3.1 + Hive: write output to CSV with header on S3

2015-07-17 Thread spark user
password = ; String url = jdbc:hive2://quickstart.cloudera:1/default;  On Friday, July 17, 2015 2:29 AM, Roberto Coluccio roberto.coluc...@gmail.com wrote: Hello community, I'm currently using Spark 1.3.1 with Hive support for outputting processed data on an external Hive table

Re: Spark 1.3.1 + Hive: write output to CSV with header on S3

2015-07-17 Thread Michael Armbrust
: Hello community, I'm currently using Spark 1.3.1 with Hive support for outputting processed data on an external Hive table backed on S3. I'm using a manual specification of the delimiter, but I'd want to know if is there any clean way to write in CSV format: *val* sparkConf = *new* SparkConf

Spark 1.3.1 + Hive: write output to CSV with header on S3

2015-07-17 Thread Roberto Coluccio
Hello community, I'm currently using Spark 1.3.1 with Hive support for outputting processed data on an external Hive table backed on S3. I'm using a manual specification of the delimiter, but I'd want to know if is there any clean way to write in CSV format: *val* sparkConf = *new* SparkConf

[Spark 1.3.1] Spark HiveQL - CDH 5.3 Hive 0.13 UDF's

2015-06-26 Thread Mike Frampton
Hi I have a five node CDH 5.3 cluster running on CentOS 6.5, I also have a separate install of Spark 1.3.1. ( The CDH 5.3 install has Spark 1.2 but I wanted a newer version. ) I managed to write some Scala based code using a Hive Context to connect to Hive and create/populate tables etc

[Spark 1.3.1 SQL] Using Hive

2015-06-21 Thread Mike Frampton
Hi Is it true that if I want to use Spark SQL ( for Spark 1.3.1 ) against Apache Hive I need to build a source version of Spark ? Im using CDH 5.3 on CentOS Linux 6.5 which uses Hive 0.13.0 ( I think ). cheers Mike F

RE: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-20 Thread Andrew Lee
-client mode. Not sure whether your other application is running under the same mode or a different one. Try specifying yarn-client mode and see if you get the same result as spark-shell. From: roberto.coluc...@gmail.com Date: Wed, 10 Jun 2015 14:32:04 +0200 Subject: [Spark 1.3.1 on YARN on EMR] Unable

Re: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-20 Thread Bozeman, Christopher
. From: roberto.coluc...@gmail.commailto:roberto.coluc...@gmail.com Date: Wed, 10 Jun 2015 14:32:04 +0200 Subject: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient To: user@spark.apache.orgmailto:user@spark.apache.org Hi! I'm

Re: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-20 Thread Roberto Coluccio
: Wed, 10 Jun 2015 14:32:04 +0200 Subject: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient To: user@spark.apache.org Hi! I'm struggling with an issue with Spark 1.3.1 running on YARN, running on an AWS EMR cluster. Such cluster

Spark SQL DATE_ADD function - Spark 1.3.1 1.4.0

2015-06-17 Thread Nathan McCarthy
| +--+--+--+--+ |2015-04-06|2015-04-06|2015-04-07|2015-04-08| +--+--+--+--+ It seems to miss a date, even though the where clause has 31st in it. When the date is just a string the select clause seems to work fine. Problem appears in Spark 1.3.1

Re: Not getting event logs = spark 1.3.1

2015-06-16 Thread Tsai Li Ming
Forgot to mention this is on standalone mode. Is my configuration wrong? Thanks, Liming On 15 Jun, 2015, at 11:26 pm, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, I have this in my spark-defaults.conf (same for hdfs): spark.eventLog.enabled true spark.eventLog.dir

Not getting event logs = spark 1.3.1

2015-06-15 Thread Tsai Li Ming
Hi, I have this in my spark-defaults.conf (same for hdfs): spark.eventLog.enabled true spark.eventLog.dir file:/tmp/spark-events spark.history.fs.logDirectory file:/tmp/spark-events While the app is running, there is a “.inprogress” directory. However when the job

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-11 Thread Josh Mahonin
:7077 /opt/mapr/spark/spark-1.3.1/bin/spark-submit-jars /home/mapr/projects/customer/lib/spark-streaming- kafka_2.10-1.3.1.jar,/home/mapr/projects/customer/lib/kafka_2.10-0.8.1.1.j ar,/home/mapr/projects/customer/lib/zkclient-0.3.jar,/home/mapr/projects/c ustomer/lib/metrics

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-11 Thread Jeroen Vlek
in Java. The dependencies in the pom.xml all have the scope provided. The job is submitted as follows: $ rm spark.log MASTER=spark://maprdemo:7077 /opt/mapr/spark/spark-1.3.1/bin/spark-submit-jars /home/mapr/projects/customer/lib/spark-streaming- kafka_2.10-1.3.1.jar,/home/mapr

[Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-10 Thread Roberto Coluccio
Hi! I'm struggling with an issue with Spark 1.3.1 running on YARN, running on an AWS EMR cluster. Such cluster is based on AMI 3.7.0 (hence Amazon Linux 2015.03, Hive 0.13 already installed and configured on the cluster, Hadoop 2.4, etc...). I make use of the AWS emr-bootstrap-action *install

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-10 Thread Josh Mahonin
as follows: $ rm spark.log MASTER=spark://maprdemo:7077 /opt/mapr/spark/spark-1.3.1/bin/spark-submit-jars /home/mapr/projects/customer/lib/spark-streaming- kafka_2.10-1.3.1.jar,/home/mapr/projects/customer/lib/kafka_2.10-0.8.1.1.jar,/home/mapr/projects/customer/lib/zkclient-0.3.jar,/home/mapr/projects

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-10 Thread Jeroen Vlek
Hi Josh, Thank you for your effort. Looking at your code, I feel that mine is semantically the same, except written in Java. The dependencies in the pom.xml all have the scope provided. The job is submitted as follows: $ rm spark.log MASTER=spark://maprdemo:7077 /opt/mapr/spark/spark-1.3.1

Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-09 Thread Jeroen Vlek
ClientRpcControllerFactory), and I've tried a lean jar while specifying the jars on the command line. For the latter, the command I used is as follows: /opt/mapr/spark/spark-1.3.1/bin/spark-submit --jars lib/spark- streaming- kafka_10-1.3.1.jar,lib/kafka_2.10-0.8.1.1.jar,lib/zkclient-0.3.jar,lib/metrics

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-09 Thread Josh Mahonin
assembly and shade plugins; I've inspected the jars, they **do** contain ClientRpcControllerFactory), and I've tried a lean jar while specifying the jars on the command line. For the latter, the command I used is as follows: /opt/mapr/spark/spark-1.3.1/bin/spark-submit --jars lib/spark

Re: Spark 1.3.1 SparkSQL metastore exceptions

2015-06-09 Thread Cheng Lian
Seems that you're using a DB2 Hive metastore? I'm not sure whether Hive 0.12.0 officially supports DB2, but probably not? (Since I didn't find DB2 scripts under the metastore/scripts/upgrade folder in Hive source tree.) Cheng On 6/9/15 8:28 PM, Needham, Guy wrote: Hi, I’m using Spark 1.3.1

Spark 1.3.1 SparkSQL metastore exceptions

2015-06-09 Thread Needham, Guy
Hi, I'm using Spark 1.3.1 to insert into a Hive 0.12 table from a SparkSQL query. The query is a very simple select from a dummy Hive table used for benchmarking. I'm using a create table as statement to do the insert. No matter if I do that or an insert overwrite, I get the same Hive exception

Re: Spark 1.3.1 On Mesos Issues.

2015-06-08 Thread John Omernik
(but apparently not in local mode) on one or more nodes? (Side question: Does your node experiment fail on all nodes?) Put another way, are the classpaths good for all JVM tasks? 2. Can you use just MapR and Spark 1.3.1 successfully, bypassing Mesos? Incidentally, how are you combining Mesos

Re: Caching parquet table (with GZIP) on Spark 1.3.1

2015-06-07 Thread Cheng Lian
/apache/parquet-mr/tree/master/parquet-tools On 5/26/15 3:26 PM, shsh...@tsmc.com wrote: we tried to cache table through hiveCtx = HiveContext(sc) hiveCtx.cacheTable(table name) as described on Spark 1.3.1's document and we're on CDH5.3.0 with Spark 1.3.1 built with Hadoop 2.6 following error

Re: Which class takes place of BlockManagerWorker in Spark 1.3.1

2015-06-06 Thread Ted Yu
that there is a class called BlockManagerWorker in spark previous releases. In the 1.3.1 code, I could see that some method comment still refers to BlockManagerWorker which doesn't exist at all. I would ask which class takes place of BlockManagerWorker in Spark 1.3.1? Thanks. BTW

Which class takes place of BlockManagerWorker in Spark 1.3.1

2015-06-06 Thread bit1...@163.com
Hi, I remembered that there is a class called BlockManagerWorker in spark previous releases. In the 1.3.1 code, I could see that some method comment still refers to BlockManagerWorker which doesn't exist at all. I would ask which class takes place of BlockManagerWorker in Spark 1.3.1? Thanks

Re: Spark 1.3.1 On Mesos Issues.

2015-06-05 Thread Steve Loughran
or more nodes? (Side question: Does your node experiment fail on all nodes?) Put another way, are the classpaths good for all JVM tasks? 2. Can you use just MapR and Spark 1.3.1 successfully, bypassing Mesos? Incidentally, how are you combining Mesos and MapR? Are you running Spark in Mesos

Re: Spark 1.3.1 On Mesos Issues.

2015-06-05 Thread John Omernik
mode) on one or more nodes? (Side question: Does your node experiment fail on all nodes?) Put another way, are the classpaths good for all JVM tasks? 2. Can you use just MapR and Spark 1.3.1 successfully, bypassing Mesos? Incidentally, how are you combining Mesos and MapR? Are you running

Re: Spark 1.3.1 On Mesos Issues.

2015-06-05 Thread Tim Chen
versions of some Spark jars that get picked up at run time (but apparently not in local mode) on one or more nodes? (Side question: Does your node experiment fail on all nodes?) Put another way, are the classpaths good for all JVM tasks? 2. Can you use just MapR and Spark 1.3.1 successfully, bypassing

Error running sbt package on Windows 7 for Spark 1.3.1 and SimpleApp.scala

2015-06-04 Thread Joseph Washington
Hi all, I'm trying to run the standalone application SimpleApp.scala following the instructions on the http://spark.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala I was able to create a .jar file by doing sbt package. However when I tried to do $

Re: Spark 1.3.1 On Mesos Issues.

2015-06-04 Thread John Omernik
MapR and Spark 1.3.1 successfully, bypassing Mesos? Incidentally, how are you combining Mesos and MapR? Are you running Spark in Mesos, but accessing data in MapR-FS? Perhaps the MapR shim library doesn't support Spark 1.3.1. HTH, dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd

Re: Spark 1.3.1 bundle does not build - unresolved dependency

2015-06-02 Thread Akhil Das
You can try to skip the tests, try with: mvn -Dhadoop.version=2.4.0 -Pyarn *-DskipTests* clean package Thanks Best Regards On Tue, Jun 2, 2015 at 2:51 AM, Stephen Boesch java...@gmail.com wrote: I downloaded the 1.3.1 distro tarball $ll ../spark-1.3.1.tar.gz -rw-r-@ 1 steve staff

Re: Re: spark 1.3.1 jars in repo1.maven.org

2015-06-02 Thread Shixiong Zhu
this is causing issues upgrading ADAM https://github.com/bigdatagenomics/adam to Spark 1.3.1 (cf. adam#690 https://github.com/bigdatagenomics/adam/pull/690#issuecomment-107769383); attempting to build against Hadoop 1.0.4 yields errors like: 2015-06-02 15:57:44 ERROR Executor:96 - Exception in task

Re: Re: spark 1.3.1 jars in repo1.maven.org

2015-06-02 Thread Ryan Williams
:149) Best Regards, Shixiong Zhu 2015-06-03 0:08 GMT+08:00 Ryan Williams ryan.blake.willi...@gmail.com: I think this is causing issues upgrading ADAM https://github.com/bigdatagenomics/adam to Spark 1.3.1 (cf. adam#690 https://github.com/bigdatagenomics/adam/pull/690#issuecomment-107769383

Re: Re: spark 1.3.1 jars in repo1.maven.org

2015-06-02 Thread Sean Owen
$1.apply$mcV$sp(ContextCleaner.scala:149) Best Regards, Shixiong Zhu 2015-06-03 0:08 GMT+08:00 Ryan Williams ryan.blake.willi...@gmail.com: I think this is causing issues upgrading ADAM https://github.com/bigdatagenomics/adam to Spark 1.3.1 (cf. adam#690 https://github.com/bigdatagenomics

Re: Re: spark 1.3.1 jars in repo1.maven.org

2015-06-02 Thread Ryan Williams
I think this is causing issues upgrading ADAM https://github.com/bigdatagenomics/adam to Spark 1.3.1 (cf. adam#690 https://github.com/bigdatagenomics/adam/pull/690#issuecomment-107769383); attempting to build against Hadoop 1.0.4 yields errors like: 2015-06-02 15:57:44 ERROR Executor:96

Spark 1.3.1 On Mesos Issues.

2015-06-01 Thread John Omernik
All - I am facing and odd issue and I am not really sure where to go for support at this point. I am running MapR which complicates things as it relates to Mesos, however this HAS worked in the past with no issues so I am stumped here. So for starters, here is what I am trying to run. This is a

Spark 1.3.1 bundle does not build - unresolved dependency

2015-06-01 Thread Stephen Boesch
I downloaded the 1.3.1 distro tarball $ll ../spark-1.3.1.tar.gz -rw-r-@ 1 steve staff 8500861 Apr 23 09:58 ../spark-1.3.1.tar.gz However the build on it is failing with an unresolved dependency: *configuration not public* $ build/sbt assembly -Dhadoop.version=2.5.2 -Pyarn -Phadoop-2.4

Re: Spark 1.3.1 On Mesos Issues.

2015-06-01 Thread Dean Wampler
(but apparently not in local mode) on one or more nodes? (Side question: Does your node experiment fail on all nodes?) Put another way, are the classpaths good for all JVM tasks? 2. Can you use just MapR and Spark 1.3.1 successfully, bypassing Mesos? Incidentally, how are you combining Mesos and MapR? Are you

Caching parquet table (with GZIP) on Spark 1.3.1

2015-05-26 Thread shshann
we tried to cache table through hiveCtx = HiveContext(sc) hiveCtx.cacheTable(table name) as described on Spark 1.3.1's document and we're on CDH5.3.0 with Spark 1.3.1 built with Hadoop 2.6 following error message would occur if we tried to cache table with parquet format GZIP though we're

Re: spark 1.3.1 jars in repo1.maven.org

2015-05-20 Thread Sean Owen
Yes, the published artifacts can only refer to one version of anything (OK, modulo publishing a large number of variants under classifiers). You aren't intended to rely on Spark's transitive dependencies for anything. Compiling against the Spark API has no relation to what version of Hadoop it

Re: Re: spark 1.3.1 jars in repo1.maven.org

2015-05-20 Thread Sean Owen
. More anon, Cheers, Edward Original Message Subject: Re: spark 1.3.1 jars in repo1.maven.org Date: 2015-05-20 00:38 From: Sean Owen so...@cloudera.com To: Edward Sargisson esa...@pobox.com Cc: user user@spark.apache.org Yes, the published artifacts can only refer

Re: Spark 1.3.1 - SQL Issues

2015-05-20 Thread Davies Liu
The docs had been updated. You should convert the DataFrame to RDD by `df.rdd` On Mon, Apr 20, 2015 at 5:23 AM, ayan guha guha.a...@gmail.com wrote: Hi Just upgraded to Spark 1.3.1. I am getting an warning Warning (from warnings module): File D:\spark\spark-1.3.1-bin-hadoop2.6\spark

Fwd: Re: spark 1.3.1 jars in repo1.maven.org

2015-05-20 Thread Edward Sargisson
libraries are available in the classloader from Spark and don't clash with existing libraries we have. More anon, Cheers, Edward Original Message Subject: Re: spark 1.3.1 jars in repo1.maven.org Date: 2015-05-20 00:38 From: Sean Owen so...@cloudera.com To: Edward Sargisson esa

Re: Spark 1.3.1 - SQL Issues

2015-05-20 Thread ayan guha
Thanks a bunch On 21 May 2015 07:11, Davies Liu dav...@databricks.com wrote: The docs had been updated. You should convert the DataFrame to RDD by `df.rdd` On Mon, Apr 20, 2015 at 5:23 AM, ayan guha guha.a...@gmail.com wrote: Hi Just upgraded to Spark 1.3.1. I am getting an warning

Spark 1.3.1 Performance Tuning/Patterns for Data Generation Heavy/Throughput Jobs

2015-05-19 Thread Night Wolf
Hi all, I have a job that, for every row, creates about 20 new objects (i.e. RDD of 100 rows in = RDD 2000 rows out). The reason for this is each row is tagged with a list of the 'buckets' or 'windows' it belongs to. The actual data is about 10 billion rows. Each executor has 60GB of memory.

RE: Spark 1.3.1 Performance Tuning/Patterns for Data Generation Heavy/Throughput Jobs

2015-05-19 Thread Evo Eftimov
Object Serialized form and Spark uses Tachion for that purpose – a distributed In Memory File System – and it is Off the JVM Heap and hence avoids GC From: Night Wolf [mailto:nightwolf...@gmail.com] Sent: Tuesday, May 19, 2015 9:36 AM To: user@spark.apache.org Subject: Spark 1.3.1 Performance

spark 1.3.1 jars in repo1.maven.org

2015-05-19 Thread Edward Sargisson
Hi, I'd like to confirm an observation I've just made. Specifically that spark is only available in repo1.maven.org for one Hadoop variant. The Spark source can be compiled against a number of different Hadoops using profiles. Yay. However, the spark jars in repo1.maven.org appear to be compiled

Re: spark 1.3.1 jars in repo1.maven.org

2015-05-19 Thread Ted Yu
I think your observation is correct. e.g. http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/1.3.1 shows that it depends on hadoop-client http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client from hadoop 2.2 Cheers On Tue, May 19, 2015 at 6:17 PM, Edward Sargisson

Hive partition table + read using hiveContext + spark 1.3.1

2015-05-14 Thread SamyaMaiti
Hi Team, I have a hive partition table with partition column having spaces. When I try to run any query, say a simple Select * from table_name, it fails. *Please note the same was working in spark 1.2.0, now I have upgraded to 1.3.1. Also there is no change in my application code base.* If I

Re: Spark 1.3.1 and Parquet Partitions

2015-05-07 Thread yana
/divdivDate:05/07/2015 7:38 AM (GMT-05:00) /divdivTo: Olivier Girardot ssab...@gmail.com /divdivCc: user@spark.apache.org /divdivSubject: Re: Spark 1.3.1 and Parquet Partitions /divdiv /divOlivier Nope. Wildcard extensions don't work I am debugging the code to figure out what's wrong I know I am

Re: Spark 1.3.1 and Parquet Partitions

2015-05-07 Thread Vaxuki
mai 2015 à 03:32, vasuki vax...@gmail.com a écrit : Spark 1.3.1 - i have a parquet file on hdfs partitioned by some string looking like this /dataset/city=London/data.parquet /dataset/city=NewYork/data.parquet /dataset/city=Paris/data.paruqet …. I am trying to get to load it using

Re: SparkSQL issue: Spark 1.3.1 + hadoop 2.6 on CDH5.3 with parquet

2015-05-07 Thread felicia
Hi all, Thanks for the help on this case! we finally settle this by adding a jar named: parquet-hive-bundle-1.5.0.jar when submitting jobs through spark-submit, where this jar file does not exist in our CDH5.3 anyway (we've downloaded it from

Re: SparkSQL issue: Spark 1.3.1 + hadoop 2.6 on CDH5.3 with parquet

2015-05-07 Thread Marcelo Vanzin
On Thu, May 7, 2015 at 7:39 PM, felicia shsh...@tsmc.com wrote: we tried to add /usr/lib/parquet/lib /usr/lib/parquet to SPARK_CLASSPATH and it doesn't seems to work, To add the jars to the classpath you need to use /usr/lib/parquet/lib/*, otherwise you're just adding the directory (and not

SparkSQL issue: Spark 1.3.1 + hadoop 2.6 on CDH5.3 with parquet

2015-05-07 Thread felicia
Hi all, I'm able to run SparkSQL through python/java and retrieve data from ordinary table, but when trying to fetch data from parquet table, following error shows up:\ which is pretty straight-forward indicating that parquet-related class was not found; we tried to add /usr/lib/parquet/lib

Re: Spark 1.3.1 and Parquet Partitions

2015-05-07 Thread in4maniac
Hi V, I am assuming that each of the three .parquet paths you mentioned have multiple partitions in them. For eg: [/dataset/city=London/data.parquet/part-r-0.parquet, /dataset/city=London/data.parquet/part-r-1.parquet] I haven't personally used this with hdfs, but I've worked with a similar

Re: Spark 1.3.1 and Parquet Partitions

2015-05-07 Thread Yana Kadiyska
From: Vaxuki Date:05/07/2015 7:38 AM (GMT-05:00) To: Olivier Girardot Cc: user@spark.apache.org Subject: Re: Spark 1.3.1 and Parquet Partitions Olivier Nope. Wildcard extensions don't work I am debugging the code to figure out what's wrong I know I am using 1.3.1 for sure Pardon typos

Re: Spark 1.3.1 and Parquet Partitions

2015-05-07 Thread Olivier Girardot
hdfs://some ip:8029/dataset/*/*.parquet doesn't work for you ? Le jeu. 7 mai 2015 à 03:32, vasuki vax...@gmail.com a écrit : Spark 1.3.1 - i have a parquet file on hdfs partitioned by some string looking like this /dataset/city=London/data.parquet /dataset/city=NewYork/data.parquet /dataset

Spark 1.3.1 and Parquet Partitions

2015-05-06 Thread vasuki
Spark 1.3.1 - i have a parquet file on hdfs partitioned by some string looking like this /dataset/city=London/data.parquet /dataset/city=NewYork/data.parquet /dataset/city=Paris/data.paruqet …. I am trying to get to load it using sqlContext using sqlcontext.parquetFile( hdfs://some ip:8029

spark 1.3.1

2015-05-04 Thread Saurabh Gupta
HI, I am trying to build a example code given at https://spark.apache.org/docs/latest/sql-programming-guide.html#interoperating-with-rdds code is: // Import factory methods provided by DataType.import org.apache.spark.sql.types.DataType;// Import StructType and StructFieldimport

Re: spark 1.3.1

2015-05-04 Thread Driesprong, Fokko
Hi Saurabh, Did you check the log of maven? 2015-05-04 15:17 GMT+02:00 Saurabh Gupta saurabh.gu...@semusi.com: HI, I am trying to build a example code given at https://spark.apache.org/docs/latest/sql-programming-guide.html#interoperating-with-rdds code is: // Import factory methods

Re: spark 1.3.1

2015-05-04 Thread Saurabh Gupta
I am really new to this but what should I look into maven logs? I have tried mvn package -X -e SHould I show the full trace? On Mon, May 4, 2015 at 6:54 PM, Driesprong, Fokko fo...@driesprong.frl wrote: Hi Saurabh, Did you check the log of maven? 2015-05-04 15:17 GMT+02:00 Saurabh Gupta

Re: spark 1.3.1

2015-05-04 Thread Deng Ching-Mallete
Hi, I think you need to import org.apache.spark.sql.types.DataTypes instead of org.apache.spark.sql.types.DataType and use that instead to access the StringType.. HTH, Deng On Mon, May 4, 2015 at 9:37 PM, Saurabh Gupta saurabh.gu...@semusi.com wrote: I am really new to this but what should I

Re: casting timestamp into long fail in Spark 1.3.1

2015-04-30 Thread Michael Armbrust
, 2015 at 3:41 PM, Justin Yip yipjus...@prediction.io wrote: Hello, I was able to cast a timestamp into long using df.withColumn(millis, $eventTime.cast(long) * 1000) in spark 1.3.0. However, this statement returns a failure with spark 1.3.1. I got the following exception: Exception

Re: casting timestamp into long fail in Spark 1.3.1

2015-04-30 Thread Justin Yip
df.withColumn(millis, $eventTime.cast(long) * 1000) in spark 1.3.0. However, this statement returns a failure with spark 1.3.1. I got the following exception: Exception in thread main org.apache.spark.sql.types.DataTypeException: Unsupported dataType: long. If you have a struct and a field name

casting timestamp into long fail in Spark 1.3.1

2015-04-30 Thread Justin Yip
Hello, I was able to cast a timestamp into long using df.withColumn(millis, $eventTime.cast(long) * 1000) in spark 1.3.0. However, this statement returns a failure with spark 1.3.1. I got the following exception: Exception in thread main org.apache.spark.sql.types.DataTypeException: Unsupported

Re: Spark 1.3.1 Hadoop 2.4 Prebuilt package broken ?

2015-04-28 Thread ๏̯͡๏
hadoop 2.4 prebuilt package (tar) from multiple mirrors and direct link. Each time i untar i get below error spark-1.3.1-bin-hadoop2.4/lib/spark-assembly-1.3.1-hadoop2.4.0.jar: (Empty error message) tar: Error exit delayed from previous errors Is it broken ? -- Deepak

Spark 1.3.1 JavaStreamingContext - fileStream compile error

2015-04-28 Thread lokeshkumar
Hi Forum I am facing below compile error when using the fileStream method of the JavaStreamingContext class. I have copied the code from JavaAPISuite.java test class of spark test code. The error message is

Re: Spark 1.3.1 JavaStreamingContext - fileStream compile error

2015-04-28 Thread Akhil Das
How about: JavaPairDStreamLongWritable, Text input = jssc.fileStream(inputDirectory, LongWritable.class, Text.class, TextInputFormat.class); See the complete example over here

Spark 1.3.1 JavaStreamingContext - fileStream compile error

2015-04-27 Thread lokeshkumar
Hi Forum I am facing below compile error when using the fileStream method of the JavaStreamingContext class. I have copied the code from JavaAPISuite.java test class of spark test code. Please help me to find a solution for this.

Spark 1.3.1 Hadoop 2.4 Prebuilt package broken ?

2015-04-27 Thread ๏̯͡๏
I downloaded 1.3.1 hadoop 2.4 prebuilt package (tar) from multiple mirrors and direct link. Each time i untar i get below error spark-1.3.1-bin-hadoop2.4/lib/spark-assembly-1.3.1-hadoop2.4.0.jar: (Empty error message) tar: Error exit delayed from previous errors Is it broken ? -- Deepak

RE: Spark 1.3.1 Hadoop 2.4 Prebuilt package broken ?

2015-04-27 Thread Ganelin, Ilya
What command are you using to untar? Are you running out of disk space? Sent with Good (www.good.com) -Original Message- From: ÐΞ€ρ@Ҝ (๏̯͡๏) [deepuj...@gmail.commailto:deepuj...@gmail.com] Sent: Monday, April 27, 2015 11:44 AM Eastern Standard Time To: user Subject: Spark 1.3.1 Hadoop

Re: Spark 1.3.1 Hadoop 2.4 Prebuilt package broken ?

2015-04-27 Thread Sean Owen
get below error spark-1.3.1-bin-hadoop2.4/lib/spark-assembly-1.3.1-hadoop2.4.0.jar: (Empty error message) tar: Error exit delayed from previous errors Is it broken ? -- Deepak - To unsubscribe, e-mail: user-unsubscr

Re: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme s3n:)

2015-04-23 Thread Sujee Maniyam
Thanks all... btw, s3n load works without any issues with spark-1.3.1-bulit-for-hadoop 2.4 I tried this on 1.3.1-hadoop26 sc.hadoopConfiguration.set(fs.s3n.impl, org.apache.hadoop.fs.s3native.NativeS3FileSystem) val f = sc.textFile(s3n://bucket/file) f.count No it can't find

Re: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme s3n:)

2015-04-23 Thread Ted Yu
NativeS3FileSystem class is in hadoop-aws jar. Looks like it was not on classpath. Cheers On Thu, Apr 23, 2015 at 7:30 AM, Sujee Maniyam su...@sujee.net wrote: Thanks all... btw, s3n load works without any issues with spark-1.3.1-bulit-for-hadoop 2.4 I tried this on 1.3.1-hadoop26

Is the Spark-1.3.1 support build with scala 2.8 ?

2015-04-23 Thread guoqing0...@yahoo.com.hk
Is the Spark-1.3.1 support build with scala 2.8 ? Wether it can integrated with kafka_2.8.0-0.8.0 If build with scala 2.10 . Thanks.

  1   2   >