Re: org/I0Itec/zkclient/serialize/ZkSerializer ClassNotFound

2014-10-21 Thread Akhil Das
You can add this jar http://central.maven.org/maven2/com/101tec/zkclient/0.3/zkclient-0.3.jar in the classpath to get ride of this. If you are hitting further exceptions like classNotFound for metrics* etc, then make sure you have all these jars in the classpath:

Re: Does start-slave.sh use the values in conf/slaves to launch a worker in Spark standalone cluster mode

2014-10-21 Thread Akhil Das
What about start-all.sh or start-slaves.sh? Thanks Best Regards On Tue, Oct 21, 2014 at 10:25 AM, Soumya Simanta soumya.sima...@gmail.com wrote: I'm working a cluster where I need to start the workers separately and connect them to a master. I'm following the instructions here and using

Re: default parallelism bug?

2014-10-21 Thread Olivier Girardot
Hi, what do you mean by pretty small ? How big is your file ? Regards, Olivier. 2014-10-21 6:01 GMT+02:00 Kevin Jung itsjb.j...@samsung.com: I use Spark 1.1.0 and set these options to spark-defaults.conf spark.scheduler.mode FAIR spark.cores.max 48 spark.default.parallelism 72 Thanks,

Re: Convert Iterable to RDD

2014-10-21 Thread Olivier Girardot
I don't think this is provided out of the box, but you can use toSeq on your Iterable and if the Iterable is lazy, it should stay that way for the Seq. And then you can use sc.parallelize(my-iterable.toSeq) so you'll have your RDD. For the Iterable[Iterable[T]] you can flatten it and then create

Re: RDD to Multiple Tables SparkSQL

2014-10-21 Thread Olivier Girardot
If you already know your keys the best way would be to extract one RDD per key (it would not bring the content back to the master and you can take advantage of the caching features) and then execute a registerTempTable by Key. But I'm guessing, you don't know the keys in advance, and in this

Spark MLLIB Decision Tree - ArrayIndexOutOfBounds Exception

2014-10-21 Thread lokeshkumar
Hi All, I am trying to run the spark example JavaDecisionTree code using some external data set. It works for certain dataset only with specific maxBins and maxDepth settings. Even for a working dataset if I add a new data item I get a ArrayIndexOutOfBounds Exception, I get the same exception

Re: What does KryoException: java.lang.NegativeArraySizeException mean?

2014-10-21 Thread Fengyun RAO
Thanks, Guilaume, Below is when the exception happens, nothing has spilled to disk yet. And there isn't a join, but a partitionBy and groupBy action. Actually if numPartitions is small, it succeeds, while if it's large, it fails. Partition was simply done by override def getPartition(key:

[SQL] Is RANK function supposed to work in SparkSQL 1.1.0?

2014-10-21 Thread Pierre B
Hi! The RANK function is available in hive since version 0.11. When trying to use it in SparkSQL, I'm getting the following exception (full stacktrace below): java.lang.ClassCastException: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank$RankBuffer cannot be cast to

Getting Spark SQL talking to Sql Server

2014-10-21 Thread Ashic Mahtab
Hi, Is there a simple way to run spark sql queries against Sql Server databases? Or are we limited to running sql and doing sc.Parallelize()? Being able to query small amounts of lookup info directly from spark can save a bunch of annoying etl, and I'd expect Spark Sql to have some way of doing

Custom s3 endpoint

2014-10-21 Thread bobrik
I have s3-compatible service and I'd like to have access to it in spark. From what I have gathered, I need to add s3service.s3-endpoint=my_s3_endpoint to file jets3t.properties in classpath. I'm not java programmer and I'm not sure where to put it in hello-world example. I managed to make it

Re: Getting Spark SQL talking to Sql Server

2014-10-21 Thread Cheng Lian
Instead of using Spark SQL, you can use JdbcRDD to extract data from SQL server. Currently Spark SQL can't run queries against SQL server. The foreign data source API planned in Spark 1.2 can make this possible. On 10/21/14 6:26 PM, Ashic Mahtab wrote: Hi, Is there a simple way to run spark

create a Row Matrix

2014-10-21 Thread viola
Hi, I am VERY new to spark and mllib and ran into a couple of problems while trying to reproduce some examples. I am aware that this is a very simple question but could somebody please give me an example - how to create a RowMatrix in scala with the following entries: [1 2 3 4]? I would like to

Re: spark sql: join sql fails after sqlCtx.cacheTable()

2014-10-21 Thread tridib
val sqlContext = new org.apache.spark.sql.SQLContext(sc) val personPath = /hdd/spark/person.json val person = sqlContext.jsonFile(personPath) person.printSchema() person.registerTempTable(person) val addressPath = /hdd/spark/address.json val address = sqlContext.jsonFile(addressPath)

Re: why fetch failed

2014-10-21 Thread marylucy
thank you it works!akka timeout may be bottle-neck in my system 在 Oct 20, 2014,17:07,Akhil Das ak...@sigmoidanalytics.com 写道: I used to hit this issue when my data size was too large and the number of partitions was too large ( 1200 ), I got ride of it by - Reducing the number of

Re: why fetch failed

2014-10-21 Thread marylucy
thanks i need check spark 1.1.0 contain it 在 Oct 21, 2014,0:01,DB Tsai dbt...@dbtsai.com 写道: I ran into the same issue when the dataset is very big. Marcelo from Cloudera found that it may be caused by SPARK-2711, so their Spark 1.1 release reverted SPARK-2711, and the issue is gone.

RE: Getting Spark SQL talking to Sql Server

2014-10-21 Thread Ashic Mahtab
Thanks. Didn't know about jdbcrdd...should do nicely for now. The foreign data source api looks interesting... Date: Tue, 21 Oct 2014 20:33:03 +0800 From: lian.cs@gmail.com To: as...@live.com; user@spark.apache.org Subject: Re: Getting Spark SQL talking to Sql Server

Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2014-10-21 Thread Arian Pasquali
That's true Guillaume. I'm currently aggregating documents considering a week as time range. I will have to make it daily and aggregate the results later. thanks for your hints anyway Arian Pasquali http://about.me/arianpasquali 2014-10-20 13:53 GMT+01:00 Guillaume Pitel

Re: Streams: How do RDDs get Aggregated?

2014-10-21 Thread jay vyas
Hi Spark ! I found out why my RDD's werent coming through in my spark stream. It turns out you need the onStart() needs to return , it seems - i.e. you need to launch the worker part of your start process in a thread. For example def onStartMock():Unit ={ val future = new Thread(new

Re: How do you write a JavaRDD into a single file

2014-10-21 Thread Steve Lewis
Collect will store the entire output in a List in memory. This solution is acceptable for Little Data problems although if the entire problem fits in the memory of a single machine there is less motivation to use Spark. Most problems which benefit from Spark are large enough that even the data

Re: spark sql: join sql fails after sqlCtx.cacheTable()

2014-10-21 Thread Rishi Yadav
Hi Tridib, I changed SQLContext to HiveContext and it started working. These are steps I used. val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) val person = sqlContext.jsonFile(json/person.json) person.printSchema() person.registerTempTable(person) val address =

Re: spark sql: join sql fails after sqlCtx.cacheTable()

2014-10-21 Thread tridib
Hmm... I thought HiveContext will only worki if Hive is present. I am curious to know when to use HiveContext and when to use SqlContext. Thanks Regards Tridib -- View this message in context:

Spark Cassandra connector issue

2014-10-21 Thread Ankur Srivastava
Hi, I am creating a cassandra java rdd and transforming it using the where clause. It works fine when I run it outside the mapValues, but when I put the code in mapValues I get an error while creating the transformation. Below is my sample code: CassandraJavaRDDReferenceData

Re: [SQL] Is RANK function supposed to work in SparkSQL 1.1.0?

2014-10-21 Thread Michael Armbrust
No, analytic and window functions do not work yet. On Tue, Oct 21, 2014 at 3:00 AM, Pierre B pierre.borckm...@realimpactanalytics.com wrote: Hi! The RANK function is available in hive since version 0.11. When trying to use it in SparkSQL, I'm getting the following exception (full

disk-backing pyspark rdds?

2014-10-21 Thread Eric Jonas
Hi All! I'm getting my feet wet with pySpark for the fairly boring case of doing parameter sweeps for monte carlo runs. Each of my functions runs for a very long time (2h+) and return numpy arrays on the order of ~100 MB. That is, my spark applications look like def foo(x):

stage failure: Task 0 in stage 0.0 failed 4 times

2014-10-21 Thread freedafeng
what could cause this type of 'stage failure'? Thanks! This is a simple py spark script to list data in hbase. command line: ./spark-submit --driver-class-path ~/spark-examples-1.1.0-hadoop2.3.0.jar /root/workspace/test/sparkhbase.py 14/10/21 17:53:50 INFO BlockManagerInfo: Added

Re: stage failure: Task 0 in stage 0.0 failed 4 times

2014-10-21 Thread freedafeng
maybe set up a hbase.jar in the conf? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/stage-failure-Task-0-in-stage-0-0-failed-4-times-tp16928p16929.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SparkSQL - TreeNodeException for unresolved attributes

2014-10-21 Thread Terry Siu
Just to follow up, the queries worked against master and I got my whole flow rolling. Thanks for the suggestion! Now if only Spark 1.2 will come out with the next release of CDH5 :P -Terry From: Terry Siu terry@smartfocus.commailto:terry@smartfocus.com Date: Monday, October 20, 2014

Re: spark sql: join sql fails after sqlCtx.cacheTable()

2014-10-21 Thread Michael Armbrust
Hmm... I thought HiveContext will only worki if Hive is present. I am curious to know when to use HiveContext and when to use SqlContext. http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started TLDR; Always use HiveContext if your application does not have a dependency

How to set hadoop native library path in spark-1.1

2014-10-21 Thread Pradeep Ch
Hi all, Can anyone tell me how to set the native library path in Spark. Right not I am setting it using SPARK_LIBRARY_PATH environmental variable in spark-env.sh. But still no success. I am still seeing this in spark-shell. NativeCodeLoader: Unable to load native-hadoop library for your

Re: spark sql: join sql fails after sqlCtx.cacheTable()

2014-10-21 Thread tridib
Thank for pointing that out. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-join-sql-fails-after-sqlCtx-cacheTable-tp16893p16933.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Streams: How do RDDs get Aggregated?

2014-10-21 Thread jay vyas
Oh - and one other note on this, which appears to be the case. If , in your stream forEachRDD implementation, you do something stupid (like call rdd.count()) tweetStream.foreachRDD((rdd,lent)= { tweetStream.repartition(1) numTweetsCollected+=1; //val count = rdd.count()

How to calculate percentiles with Spark?

2014-10-21 Thread sparkuser
Hi, What would be the best way to get percentiles from a Spark RDD? I can see JavaDoubleRDD or MLlib's MultivariateStatisticalSummary https://spark.apache.org/docs/latest/mllib-statistics.html provide the mean() but not percentiles. Thank you! Horace -- View this message in context:

Spark-Submit Python along with JAR

2014-10-21 Thread TJ Klein
Hi, I'd like to run my python script using spark-submit together with a JAR file containing Java specifications for a Hadoop file system. How can I do that? It seems I can either provide a JAR file or a PYthon file to spark-submit. So far I have been running my code in ipython with

spark sql: sqlContext.jsonFile date type detection and perforormance

2014-10-21 Thread tridib
Any help? or comments? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881p16939.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Primitive arrays in Spark

2014-10-21 Thread Akshat Aranya
This is as much of a Scala question as a Spark question I have an RDD: val rdd1: RDD[(Long, Array[Long])] This RDD has duplicate keys that I can collapse such val rdd2: RDD[(Long, Array[Long])] = rdd1.reduceByKey((a,b) = a++b) If I start with an Array of primitive longs in rdd1, will rdd2

MLLib libsvm format

2014-10-21 Thread Sameer Tilak
Hi All,I have a question regarding the ordering of indices. The document says that the indices indices are one-based and in ascending order. However, do the indices within a row need to be sorted in ascending order? Sparse dataIt is very common in practice to have sparse training data. MLlib

Re: [SQL] Is RANK function supposed to work in SparkSQL 1.1.0?

2014-10-21 Thread Pierre B
Ok thanks Michael. In general, what's the easy way to figure out what's already implemented? The exception I was getting was not really helpful here? Also, is there a roadmap document somewhere ? Thanks! P. -- View this message in context:

Usage of spark-ec2: how to deploy a revised version of spark 1.1.0?

2014-10-21 Thread freedafeng
Thanks for the help! Hadoop version: 2.3.0 Hbase version: 0.98.1 Use python to read/write data from/to hbase. Only change over the official spark 1.1.0 is the pom file under examples. Compilation: spark:mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package

Re: How to calculate percentiles with Spark?

2014-10-21 Thread lordjoe
A rather more general question is - assume I have an JavaRDDK which is sorted - How can I convert this into a JavaPairRDDInteger,K where the Integer is tie index - 0...N - 1. Easy to do on one machine JavaRDDK values = ... // create here JavaRDDInteger,K positions =

Re: Class not found

2014-10-21 Thread Pat Ferrel
maven cache is laid out differently but it does work on Linux and BSD/mac. Still looks like a hack to me. On Oct 21, 2014, at 1:28 PM, Pat Ferrel p...@occamsmachete.com wrote: Doesn’t this seem like a dangerous error prone hack? It will build different bits on different machines. It doesn’t

com.esotericsoftware.kryo.KryoException: Buffer overflow.

2014-10-21 Thread nitinkak001
I am running a simple rdd filter command. What does it mean? Here is the full stack trace(and code below it): com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 133 at com.esotericsoftware.kryo.io.Output.require(Output.java:138) at

SchemaRDD.where clause error

2014-10-21 Thread Kevin Paul
Hi all, I tried to use the function SchemaRDD.where() but got some error: val people = sqlCtx.sql(select * from people) people.where('age === 10) console:27: error: value === is not a member of Symbol where did I go wrong? Thanks, Kevin Paul

buffer overflow when running Kmeans

2014-10-21 Thread Yang
this is the stack trace I got with yarn logs -applicationId really no idea where to dig further. thanks! yang 14/10/21 14:36:43 INFO ConnectionManager: Accepted connection from [ phxaishdc9dn1262.stratus.phx.ebay.com/10.115.58.21] 14/10/21 14:36:47 ERROR Executor: Exception in task ID 98

Re: SchemaRDD.where clause error

2014-10-21 Thread Michael Armbrust
You need to import sqlCtx._ to get access to the implicit conversion. On Tue, Oct 21, 2014 at 2:40 PM, Kevin Paul kevinpaulap...@gmail.com wrote: Hi all, I tried to use the function SchemaRDD.where() but got some error: val people = sqlCtx.sql(select * from people) people.where('age ===

Re: buffer overflow when running Kmeans

2014-10-21 Thread Ted Yu
Just posted below for a similar question. Have you seen this thread ? http://search-hadoop.com/m/JW1q5ezXPH/KryoException%253A+Buffer+overflowsubj=RE+spark+nbsp+kryo+serilizable+nbsp+exception On Tue, Oct 21, 2014 at 2:44 PM, Yang tedd...@gmail.com wrote: this is the stack trace I got

How to read BZ2 XML file in Spark?

2014-10-21 Thread John Roberts
Hi, I want to ingest Open Street Map. It's 43GB (compressed) XML in BZIP2 format. What's your advice for reading it in to an RDD? BTW, the Spark Training at UMD is awesome! I'm having a blast learning Spark. I wish I could go to the MeetUp tonight, but I have kid activities...

Re: spark sql: sqlContext.jsonFile date type detection and perforormance

2014-10-21 Thread Yin Huai
Is there any specific issues you are facing? Thanks, Yin On Tue, Oct 21, 2014 at 4:00 PM, tridib tridib.sama...@live.com wrote: Any help? or comments? -- View this message in context:

Re: MLLib libsvm format

2014-10-21 Thread Xiangrui Meng
Yes. where the indices are one-based and **in ascending order**. -Xiangrui On Tue, Oct 21, 2014 at 1:10 PM, Sameer Tilak ssti...@live.com wrote: Hi All, I have a question regarding the ordering of indices. The document says that the indices indices are one-based and in ascending order.

spark ui redirecting to port 8100

2014-10-21 Thread sadhan
Set up the spark port to a different one and the connection seems successful but get a 302 to /proxy on port 8100 ? Nothing is listening on that port as well. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-ui-redirecting-to-port-8100-tp16956.html

Re: create a Row Matrix

2014-10-21 Thread Xiangrui Meng
Please check out the example code: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/TallSkinnySVD.scala -Xiangrui On Tue, Oct 21, 2014 at 5:34 AM, viola viola.wiersc...@siemens.com wrote: Hi, I am VERY new to spark and mllib and ran into a

RE: MLLib libsvm format

2014-10-21 Thread Sameer Tilak
Great, I will sort them. Sent via the Samsung GALAXY S®4, an ATT 4G LTE smartphone div Original message /divdivFrom: Xiangrui Meng men...@gmail.com /divdivDate:10/21/2014 3:29 PM (GMT-08:00) /divdivTo: Sameer Tilak ssti...@live.com /divdivCc: user@spark.apache.org

Re: How to read BZ2 XML file in Spark?

2014-10-21 Thread sameerf
Hi John, Glad you're enjoying the Spark training at UMD. Is the 43 GB XML data in a single file or split across multiple BZIP2 files? Is the file in a HDFS cluster or on a single linux machine? If you're using BZIP2 with splittable compression (in HDFS), you'll need at least Hadoop 1.1:

Re: spark ui redirecting to port 8100

2014-10-21 Thread Sameer Farooqui
Hi Sadhan, Which port are you specifically trying to redirect? The driver program has a web UI, typically on port 4040... or the Spark Standalone Cluster Master has a UI exposed on port 7077. Which setting did you update in which file to make this change? And finally, which version of Spark are

Spark Streaming - How to write RDD's in same directory ?

2014-10-21 Thread Shailesh Birari
Hello, Spark 1.1.0, Hadoop 2.4.1 I have written a Spark streaming application. And I am getting FileAlreadyExistsException for rdd.saveAsTextFile(outputFolderPath). Here is brief what I am is trying to do. My application is creating text file stream using Java Stream context. The input file is

Re: Usage of spark-ec2: how to deploy a revised version of spark 1.1.0?

2014-10-21 Thread sameerf
Hi, Can you post what the error looks like? Sameer F. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Usage-of-spark-ec2-how-to-deploy-a-revised-version-of-spark-1-1-0-tp16943p16963.html Sent from the Apache Spark User List mailing list archive at

Re: Spark MLLIB Decision Tree - ArrayIndexOutOfBounds Exception

2014-10-21 Thread Joseph Bradley
Hi, this sounds like a bug which has been fixed in the current master. What version of Spark are you using? Would it be possible to update to the current master? If not, it would be helpful to know some more of the problem dimensions (num examples, num features, feature types, label type).

Re: Spark Streaming - How to write RDD's in same directory ?

2014-10-21 Thread Sameer Farooqui
Hi Shailesh, Spark just leverages the Hadoop File Output Format to write out the RDD you are saving. This is really a Hadoop OutputFormat limitation which requires the directory it is writing into to not exist. The idea is that a Hadoop job should not be able to overwrite the results from a

Using the DataStax Cassandra Connector from PySpark

2014-10-21 Thread Mike Sukmanowsky
Hi there, I'm using Spark 1.1.0 and experimenting with trying to use the DataStax Cassandra Connector (https://github.com/datastax/spark-cassandra-connector) from within PySpark. As a baby step, I'm simply trying to validate that I have access to classes that I'd need via Py4J. Sample python

Re: Primitive arrays in Spark

2014-10-21 Thread Matei Zaharia
It seems that ++ does the right thing on arrays of longs, and gives you another one: scala val a = Array[Long](1,2,3) a: Array[Long] = Array(1, 2, 3) scala val b = Array[Long](1,2,3) b: Array[Long] = Array(1, 2, 3) scala a ++ b res0: Array[Long] = Array(1, 2, 3, 1, 2, 3) scala res0.getClass

Re: Spark Streaming - How to write RDD's in same directory ?

2014-10-21 Thread Shailesh Birari
Thanks Sameer for quick reply. I will try to implement it. Shailesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-How-to-write-RDD-s-in-same-directory-tp16962p16970.html Sent from the Apache Spark User List mailing list archive at

Re: spark sql not able to find classes with --jars option

2014-10-21 Thread sadhan
It was mainly because spark was setting the jar classes in a thread local context classloader. The quick fix was to make our serde use the context classloader first. -- View this message in context:

Re: Strategies for reading large numbers of files

2014-10-21 Thread Landon Kuhn
Thanks to folks here for the suggestions. I ended up settling on what seems to be a simple and scalable approach. I am no longer using sparkContext.textFiles with wildcards (it is too slow when working with a large number of files). Instead, I have implemented directory traversal as a Spark job,

Re: spark sql: sqlContext.jsonFile date type detection and perforormance

2014-10-21 Thread tridib
Yes, I am unable to use jsonFile() so that it can detect date type automatically from json data. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-sqlContext-jsonFile-date-type-detection-and-perforormance-tp16881p16974.html Sent from the Apache

Spark Streaming Applications

2014-10-21 Thread Saiph Kappa
Hi, I have been trying to find a fairly complex application that makes use of the Spark Streaming framework. I checked public github repos but the examples I found were too simple, only comprising simple operations like counters and sums. On the Spark summit website, I could find very interesting

spark 1.1.0 RDD and Calliope 1.1.0-CTP-U2-H2

2014-10-21 Thread Tian Zhang
Hi, I am using the latest calliope library from tuplejump.com to create RDD for cassandra table. I am on a 3 nodes spark 1.1.0 with yarn. My cassandra table is defined as below and I have about 2000 rows of data inserted. CREATE TABLE top_shows ( program_id varchar, view_minute timestamp,

Re: Spark Cassandra connector issue

2014-10-21 Thread Ankur Srivastava
Is this because I am calling a transformation function on an rdd from inside another transformation function? Is it not allowed? Thanks Ankut On Oct 21, 2014 1:59 PM, Ankur Srivastava ankur.srivast...@gmail.com wrote: Hi Gerard, this is the code that may be helpful. public class

Re: Spark SQL : sqlContext.jsonFile date type detection and perforormance

2014-10-21 Thread Yin Huai
Add one more thing about question 1. Once you get the SchemaRDD from jsonFile/jsonRDD, you can use CAST(columnName as DATE) in your query to cast the column type from the StringType to DateType (the string format should be -[m]m-[d]d and you need to use hiveContext). Here is the code snippet

Re: Spark - HiveContext - Unstructured Json

2014-10-21 Thread Cheng Lian
You can resort to |SQLContext.jsonFile(path: String, samplingRate: Double)| and set |samplingRate| to 1.0, so that all the columns can be inferred. You can also use |SQLContext.applySchema| to specify your own schema (which is a |StructType|). On 10/22/14 5:56 AM, Harivardan Jayaraman

Re: Asynchronous Broadcast from driver to workers, is it possible?

2014-10-21 Thread Peng Cheng
Looks like the only way is to implement that feature. There is no way of hacking it into working -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Asynchronous-Broadcast-from-driver-to-workers-is-it-possible-tp15758p16985.html Sent from the Apache Spark User

Re: com.esotericsoftware.kryo.KryoException: Buffer overflow.

2014-10-21 Thread Koert Kuipers
you ran out of kryo buffer. are you using spark 1.1 (which supports buffer resizing) or spark 1.0 (which has a fixed size buffer)? On Oct 21, 2014 5:30 PM, nitinkak001 nitinkak...@gmail.com wrote: I am running a simple rdd filter command. What does it mean? Here is the full stack trace(and code

Re: Spark MLLIB Decision Tree - ArrayIndexOutOfBounds Exception

2014-10-21 Thread lokeshkumar
Hi Joseph I am using spark 1.1.0 the latest version, I will try to update to the current master and check. The example I am running is JavaDecisionTree, the dataset is of libsvm format containing 1. 45 instances of training sample. 2. 5 features 3. I am not sure what is feature type, but

Num-executors and executor-cores overwritten by defaults

2014-10-21 Thread Ilya Ganelin
Hi all. Just upgraded our cluster to CDH 5.2 (with Spark 1.1) but now I can no longer set the number of executors or executor-cores. No matter what values I pass on the command line to spark they are overwritten by the defaults. Does anyone have any idea what could have happened here? Running on

spark sql query optimization , and decision tree building

2014-10-21 Thread sanath kumar
Hi all , I have a large data in text files (1,000,000 lines) .Each line has 128 columns . Here each line is a feature and each column is a dimension. I have converted the txt files in json format and able to run sql queries on json files using spark. Now i am trying to build a k dimenstion

Re: create a Row Matrix

2014-10-21 Thread viola
Thanks for the quick response. However, I still only get error messages. I am able to load a .txt file with entries in it and use it in sparks, but I am not able to create a simple matrix, for instance a 2x2 row matrix [1 2 3 4] I tried variations such as val RowMatrix =