Re: Convert SchemaRDD to RDD

2015-10-16 Thread Ted Yu
case 1 => val1 > case 2 => val2 > //... cases up to 26 > } > } > ​ > hence expecting an approach to convert SchemaRDD to RDD without using > Tuple or Case Class as we have restrictions in Scala 2.10 > > Regards > Satish Chandra >

Convert SchemaRDD to RDD

2015-10-16 Thread satish chandra j
Hi All, To convert SchemaRDD to RDD below snipped is working if SQL statement has columns in a row are less than 22 as per tuple restriction rdd.map(row => row.toString) But if SQL statement has columns more than 22 than the above snippet will error "*object Tuple27 is not a member of

Re: Convert SchemaRDD to RDD

2015-10-16 Thread Ted Yu
Have you seen this thread ? http://search-hadoop.com/m/q3RTt9YBFr17u8j8=Scala+Limitation+Case+Class+definition+with+more+than+22+arguments On Fri, Oct 16, 2015 at 7:41 AM, satish chandra j <jsatishchan...@gmail.com> wrote: > Hi All, > To convert SchemaRDD to RDD below snipped is wo

Re: Convert SchemaRDD to RDD

2015-10-16 Thread satish chandra j
(that: Any): Boolean = that.isInstanceOf[MyRecord] def productArity: Int = 26 // example value, it is amount of arguments def productElement(n: Int): Serializable = n match { case 1 => val1 case 2 => val2 //... cases up to 26 } } ​ hence expecting an approach to convert

Converting a DStream to schemaRDD

2015-09-29 Thread Daniel Haviv
Hi, I have a DStream which is a stream of RDD[String]. How can I pass a DStream to sqlContext.jsonRDD and work with it as a DF ? Thank you. Daniel

Re: Converting a DStream to schemaRDD

2015-09-29 Thread Adrian Tanase
, September 29, 2015 at 5:09 PM To: Daniel Haviv, user Subject: RE: Converting a DStream to schemaRDD Something like: dstream.foreachRDD { rdd => val df = sqlContext.read.json(rdd) df.select(…) } https://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operati

RE: Converting a DStream to schemaRDD

2015-09-29 Thread Ewan Leith
ork it as if it were a standard RDD dataset. Ewan From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] Sent: 29 September 2015 15:03 To: user <user@spark.apache.org> Subject: Converting a DStream to schemaRDD Hi, I have a DStream which is a stream of RDD[String]. How can I pass

Re: Nested DataFrame(SchemaRDD)

2015-06-24 Thread Richard Catlin
-in-spark-sql/ 2015-06-23 16:12 GMT-07:00 Richard Catlin richard.m.cat...@gmail.com: How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a column? Is there an example? Will this store as a nested parquet file? Thanks. Richard Catlin

Re: Nested DataFrame(SchemaRDD)

2015-06-23 Thread Roberto Congiu
I wrote a brief howto on building nested records in spark and storing them in parquet here: http://www.congiu.com/creating-nested-data-parquet-in-spark-sql/ 2015-06-23 16:12 GMT-07:00 Richard Catlin richard.m.cat...@gmail.com: How do I create a DataFrame(SchemaRDD) with a nested array of Rows

RE: Nested DataFrame(SchemaRDD)

2015-06-23 Thread Richard Catlin
How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a column? Is there an example? Will this store as a nested parquet file? Thanks. Richard Catlin

Re: Nested DataFrame(SchemaRDD)

2015-06-23 Thread Michael Armbrust
: How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a column? Is there an example? Will this store as a nested parquet file? Thanks. Richard Catlin

Format RDD/SchemaRDD contents to screen?

2015-05-29 Thread Minnow Noir
the contents of an RDD/SchemaRDD on the screen in a formatted way? For example, say I want to take() the first 30 lines/rows in an *RDD and present them in a readable way on the screen so that I can see what's missing or invalid. Obviously, I'm just trying to sample the results in a readable way

Re: Format RDD/SchemaRDD contents to screen?

2015-05-29 Thread ayan guha
Depending on your spark version, you can convert schemaRDD to a dataframe and then use .show() On 30 May 2015 10:33, Minnow Noir minnown...@gmail.com wrote: Im trying to debug query results inside spark-shell, but finding it cumbersome to save to file and then use file system utils to explore

Spark 1.2.1: How to convert SchemaRDD to CassandraRDD?

2015-04-27 Thread Tash Chainar
Hi all, following the import com.datastax.spark.connector.SelectableColumnRef; import com.datastax.spark.connector.japi.CassandraJavaUtil; import org.apache.spark.sql.SchemaRDD; import static com.datastax.spark.connector.util.JavaApiHelper.toScalaSeq; import scala.collection.Seq; SchemaRDD

Spark SQL: SchemaRDD, DataFrame. Multi-value, Nested attributes

2015-04-22 Thread Eugene Morozov
into one long list of columns as I would be able to find some weird stuff by doing that. So my question is the following: 1. Does SchemaRDD support something like multi value attributes? It might look like and array of values that lives in just one column. Although it’s not clear how I’d aggregate

saving schemaRDD to cassandra

2015-03-27 Thread Hafiz Mujadid
Hi experts! I would like to know is there anyway to store schemaRDD to cassandra? if yes then how to store in existing cassandra column family and new column family? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saving-schemaRDD-to-cassandra

Re: SchemaRDD/DataFrame result partitioned according to the underlying datasource partitions

2015-03-23 Thread Michael Armbrust
assumptions about partitioning. On Mon, Mar 23, 2015 at 10:22 AM, Stephen Boesch java...@gmail.com wrote: Is there a way to take advantage of the underlying datasource partitions when generating a DataFrame/SchemaRDD via catalyst? It seems from the sql module that the only options

SchemaRDD/DataFrame result partitioned according to the underlying datasource partitions

2015-03-23 Thread Stephen Boesch
Is there a way to take advantage of the underlying datasource partitions when generating a DataFrame/SchemaRDD via catalyst? It seems from the sql module that the only options are RangePartitioner and HashPartitioner - and further that those are selected automatically by the code

Using regular rdd transforms on schemaRDD

2015-03-17 Thread kpeng1
Hi All, I was wondering how rdd transformation work on schemaRDDs. Is there a way to force the rdd transform to keep the schemaRDD types or do I need to recreate the schemaRDD by applying the applySchema method? Currently what I have is an array of SchemaRDDs and I just want to do a union

Re: Using regular rdd transforms on schemaRDD

2015-03-17 Thread kpeng1
Looks like if I use unionAll this works. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-regular-rdd-transforms-on-schemaRDD-tp22105p22107.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Iterate over contents of schemaRDD loaded from parquet file to extract timestamp

2015-03-16 Thread Cheng Lian
from a parquet file, stored in a schemaRDD [7654321,2015-01-01 00:00:00.007,0.49,THU] Since, in spark version 1.1.0, parquet format doesn't support saving timestamp valuues, I have saved the timestamp data as string. Can you please tell me how to iterate over the data in this schema RDD to retrieve

Iterate over contents of schemaRDD loaded from parquet file to extract timestamp

2015-03-16 Thread anu
Spark Version - 1.1.0 Scala - 2.10.4 I have loaded following type data from a parquet file, stored in a schemaRDD [7654321,2015-01-01 00:00:00.007,0.49,THU] Since, in spark version 1.1.0, parquet format doesn't support saving timestamp valuues, I have saved the timestamp data as string. Can you

Re: SchemaRDD: SQL Queries vs Language Integrated Queries

2015-03-11 Thread Tobias Pfeiffer
, registerTempTable is just a Map[String, SchemaRDD] insertion, nothing that would be measurable. But there are no distributed/RDD operations involved, I think. Tobias

Re: SchemaRDD: SQL Queries vs Language Integrated Queries

2015-03-11 Thread Cesar Flores
transformers classes for feature extraction, and If I need to save the input and maybe output SchemaRDD of the transform function in every transformer, this may not very efficient. Thanks On Tue, Mar 10, 2015 at 8:20 PM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, On Tue, Mar 10, 2015 at 2:13 PM

Re: SchemaRDD: SQL Queries vs Language Integrated Queries

2015-03-10 Thread Tobias Pfeiffer
Hi, On Tue, Mar 10, 2015 at 2:13 PM, Cesar Flores ces...@gmail.com wrote: I am new to the SchemaRDD class, and I am trying to decide in using SQL queries or Language Integrated Queries ( https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD ). Can someone

SchemaRDD: SQL Queries vs Language Integrated Queries

2015-03-10 Thread Cesar Flores
I am new to the SchemaRDD class, and I am trying to decide in using SQL queries or Language Integrated Queries ( https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD ). Can someone tell me what is the main difference between the two approaches, besides using

Re: SchemaRDD: SQL Queries vs Language Integrated Queries

2015-03-10 Thread Reynold Xin
They should have the same performance, as they are compiled down to the same execution plan. Note that starting in Spark 1.3, SchemaRDD is renamed DataFrame: https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html On Tue, Mar 10, 2015 at 2:13

Re: Construct model matrix from SchemaRDD automatically

2015-03-05 Thread Evan R. Sparks
Hi Wush, I'm CC'ing user@spark.apache.org (which is the new list) and BCC'ing u...@spark.incubator.apache.org. In Spark 1.3, schemaRDD is in fact being renamed to DataFrame (see: https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html

Construct model matrix from SchemaRDD automatically

2015-03-05 Thread Wush Wu
Dear all, I am a new spark user from R. After exploring the schemaRDD, I notice that it is similar to data.frame. Is there a feature like `model.matrix` in R to convert schemaRDD to model matrix automatically according to the type without explicitly converting them one by one? Thanks, Wush

Spark Streaming and SchemaRDD usage

2015-03-04 Thread Haopu Wang
Hi, in the roadmap of Spark in 2015 (link: http://files.meetup.com/3138542/Spark%20in%202015%20Talk%20-%20Wendell.p ptx), I saw SchemaRDD is designed to be the basis of BOTH Spark Streaming and Spark SQL. My question is: what's the typical usage of SchemaRDD in a Spark Streaming application

Spark SQL Converting RDD to SchemaRDD without hardcoding a case class in scala

2015-02-27 Thread kpeng1
Hi All, I am currently trying to build out a spark job that would basically convert a csv file into parquet. From what I have seen it looks like spark sql is the way to go and how I would go about this would be to load in the csv file into an RDD and convert it into a schemaRDD by injecting

Re: Spark SQL Converting RDD to SchemaRDD without hardcoding a case class in scala

2015-02-27 Thread Michael Armbrust
seen it looks like spark sql is the way to go and how I would go about this would be to load in the csv file into an RDD and convert it into a schemaRDD by injecting in the schema via a case class. What I want to avoid is hard coding in the case class itself. I want to reuse this job

Converting SchemaRDD/Dataframe to RDD[vector]

2015-02-26 Thread mobsniuk
I've been searching around and see others have asked similar questions. Given a schemaRDD I extract a restless that contains numbers, both Int and Doubles. How do I construct a RDD[Vector]? In 1.2 I wrote the results to a textile and then read them back in splitting them with some code I found

Re: Converting SchemaRDD/Dataframe to RDD[vector]

2015-02-26 Thread Xiangrui Meng
have asked similar questions. Given a schemaRDD I extract a restless that contains numbers, both Int and Doubles. How do I construct a RDD[Vector]? In 1.2 I wrote the results to a textile and then read them back in splitting them with some code I found in a ML book on Spark Analytics

Re: [Spark SQL]: Convert SchemaRDD back to RDD

2015-02-23 Thread Michael Armbrust
this method: /** * Returns the content of the [[DataFrame]] as an [[RDD]] of [[Row]]s. * @group rdd */ def rdd: RDD[Row] = { FYI On Sun, Feb 22, 2015 at 11:51 AM, stephane.collot stephane.col...@gmail.com wrote: Hi Michael, I think that the feature (convert a SchemaRDD

Re: [Spark SQL]: Convert SchemaRDD back to RDD

2015-02-22 Thread Ted Yu
, 2015 at 11:51 AM, stephane.collot stephane.col...@gmail.com wrote: Hi Michael, I think that the feature (convert a SchemaRDD to a structured class RDD) is now available. But I didn't understand in the PR how exactly to do this. Can you give an example or doc links? Best regards

Re: [Spark SQL]: Convert SchemaRDD back to RDD

2015-02-22 Thread stephane.collot
Hi Michael, I think that the feature (convert a SchemaRDD to a structured class RDD) is now available. But I didn't understand in the PR how exactly to do this. Can you give an example or doc links? Best regards -- View this message in context: http://apache-spark-user-list.1001560.n3

how to get SchemaRDD SQL exceptions i.e. table not found exception

2015-02-13 Thread sachin Singh
Hi, can some one guide how to get SQL Exception trapped for query executed using SchemaRDD, i mean suppose table not found thanks in advance, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-get-SchemaRDD-SQL-exceptions-i-e-table-not-found

Re: Spark SQL - Point lookup optimisation in SchemaRDD?

2015-02-11 Thread nitin
-in-SchemaRDD-tp21555p21613.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Spark SQL - Point lookup optimisation in SchemaRDD?

2015-02-09 Thread nitin
Hi All, I have a use case where I have cached my schemaRDD and I want to launch executors just on the partition which I know of (prime use-case of PartitionPruningRDD). I tried something like following :- val partitionIdx = 2 val schemaRdd = hiveContext.table(myTable) //myTable is cached

Re: LeaseExpiredException while writing schemardd to hdfs

2015-02-05 Thread Petar Zecevic
Why don't you just map rdd's rows to lines and then call saveAsTextFile()? On 3.2.2015. 11:15, Hafiz Mujadid wrote: I want to write whole schemardd to single in hdfs but facing following exception rg.apache.hadoop.ipc.RemoteException

Re: Error in saving schemaRDD with Decimal as Parquet

2015-02-03 Thread Manoj Samel
Hi, Any thoughts ? Thanks, On Sun, Feb 1, 2015 at 12:26 PM, Manoj Samel manojsamelt...@gmail.com wrote: Spark 1.2 SchemaRDD has schema with decimal columns created like x1 = new StructField(a, DecimalType(14,4), true) x2 = new StructField(b, DecimalType(14,4), true) Registering as SQL

LeaseExpiredException while writing schemardd to hdfs

2015-02-03 Thread Hafiz Mujadid
I want to write whole schemardd to single in hdfs but facing following exception rg.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /test/data/data1.csv (inode 402042): File does not exist. Holder DFSClient_NONMAPREDUCE_-564238432_57

Error in saving schemaRDD with Decimal as Parquet

2015-02-01 Thread Manoj Samel
Spark 1.2 SchemaRDD has schema with decimal columns created like x1 = new StructField(a, DecimalType(14,4), true) x2 = new StructField(b, DecimalType(14,4), true) Registering as SQL Temp table and doing SQL queries on these columns , including SUM etc. works fine, so the schema Decimal does

Re: Error in saving schemaRDD with Decimal as Parquet

2015-02-01 Thread Manoj Samel
I think I found the issue causing it. I was calling schemaRDD.coalesce(n).saveAsParquetFile to reduce the number of partitions in parquet file - in which case the stack trace happens. If I compress the partitions before creating schemaRDD then the schemaRDD.saveAsParquetFile call works

StackOverflowError with SchemaRDD

2015-01-28 Thread ankits
Hi, I am getting a stack overflow error when querying a schemardd comprised of parquet files. This is (part of) the stack trace: Caused by: java.lang.StackOverflowError at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce

Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats?

2015-01-20 Thread Cheng Lian
To: Nathan nathan.mccar...@quantium.com.au mailto:nathan.mccar...@quantium.com.au, Michael Armbrust mich...@databricks.com mailto:mich...@databricks.com Cc: user@spark.apache.org mailto:user@spark.apache.org user@spark.apache.org mailto:user@spark.apache.org Subject: Re: SparkSQL schemaRDD

Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats?

2015-01-16 Thread Michael Armbrust
: Monday, 12 January 2015 1:21 am To: Nathan nathan.mccar...@quantium.com.au, Michael Armbrust mich...@databricks.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats? On 1/11/15 1:40 PM, Nathan McCarthy

Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats?

2015-01-15 Thread Nathan McCarthy
@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats? On 1/11/15 1:40 PM, Nathan McCarthy wrote: Thanks Cheng Michael! Makes sense. Appreciate the tips! Idiomatic scala isn't performant. I’ll

Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats?

2015-01-11 Thread Cheng Lian
@gmail.com Cc: Nathan nathan.mccar...@quantium.com.au mailto:nathan.mccar...@quantium.com.au, user@spark.apache.org mailto:user@spark.apache.org user@spark.apache.org mailto:user@spark.apache.org Subject: Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats? The other

Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats?

2015-01-10 Thread Nathan McCarthy
@spark.apache.org Subject: Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats? The other thing to note here is that Spark SQL defensively copies rows when we switch into user code. This probably explains the difference between 1 2. The difference between 1 3

Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats?

2015-01-09 Thread Michael Armbrust
mentioned below). Now this takes around ~49 seconds… Even though test1 table is 100% cached. The number of partitions remains the same… Now if I create a simple RDD of a case class HourSum(hour: Int, qty: Double, sales: Double) Convert the SchemaRDD; val rdd = sqlC.sql(select * from test1

Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats?

2015-01-09 Thread Cheng Lian
remains the same… Now if I create a simple RDD of a case class HourSum(hour: Int, qty: Double, sales: Double) Convert the SchemaRDD; val rdd = sqlC.sql(select * from test1).map{ r = HourSum(r.getInt(1), r.getDouble(7), r.getDouble(8)) }.cache() //cache all the data rdd.count() Then run basically

Re: SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats?

2015-01-08 Thread Nathan McCarthy
Any ideas? :) From: Nathan nathan.mccar...@quantium.com.aumailto:nathan.mccar...@quantium.com.au Date: Wednesday, 7 January 2015 2:53 pm To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: SparkSQL schemaRDD MapPartitions calls

SparkSQL schemaRDD MapPartitions calls - performance issues - columnar formats?

2015-01-06 Thread Nathan McCarthy
) = (a._1 + b._1, a._2 + b._2)).collect().foreach(println) Now this takes around ~49 seconds… Even though test1 table is 100% cached. The number of partitions remains the same… Now if I create a simple RDD of a case class HourSum(hour: Int, qty: Double, sales: Double) Convert the SchemaRDD; val rdd

Re: Add StructType column to SchemaRDD

2015-01-05 Thread Michael Armbrust
...@preferred.jp wrote: Hi, I have a SchemaRDD where I want to add a column with a value that is computed from the rest of the row. As the computation involves a network operation and requires setup code, I can't use SELECT *, myUDF(*) FROM rdd, but I wanted to use a combination of: - get schema

Re: Add StructType column to SchemaRDD

2015-01-05 Thread Tobias Pfeiffer
Hi Michael, On Tue, Jan 6, 2015 at 3:43 PM, Michael Armbrust mich...@databricks.com wrote: Oh sorry, I'm rereading your email more carefully. Its only because you have some setup code that you want to amortize? Yes, exactly that. Concerning the docs, I'd be happy to contribute, but I don't

Re: Add StructType column to SchemaRDD

2015-01-05 Thread Michael Armbrust
support for partitions in general... We do support for Hive TGFs though and we could possibly add better scala syntax for this concept or something else. On Mon, Jan 5, 2015 at 9:52 PM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, I have a SchemaRDD where I want to add a column with a value

Add StructType column to SchemaRDD

2015-01-05 Thread Tobias Pfeiffer
Hi, I have a SchemaRDD where I want to add a column with a value that is computed from the rest of the row. As the computation involves a network operation and requires setup code, I can't use SELECT *, myUDF(*) FROM rdd, but I wanted to use a combination of: - get schema of input SchemaRDD

Re: SchemaRDD to RDD[String]

2014-12-30 Thread Yana
) and see if you get anything -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-to-RDD-String-tp20846p20910.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD

2014-12-24 Thread Cheng Lian
...@gmail.com] *Sent:* Wednesday, December 24, 2014 4:26 AM *To:* user@spark.apache.org *Subject:* SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD Hi spark users, I'm trying to create external table using HiveContext after creating a schemaRDD and saving the RDD into a parquet file on hdfs. I would

Re: SchemaRDD to RDD[String]

2014-12-24 Thread Tobias Pfeiffer
Hi, On Wed, Dec 24, 2014 at 3:18 PM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: I want to convert a schemaRDD into RDD of String. How can we do that? Currently I am doing like this which is not converting correctly no exception but resultant strings are empty here is my code Hehe

Re: SchemaRDD to RDD[String]

2014-12-24 Thread Michael Armbrust
You might also try the following, which I think is equivalent: schemaRDD.map(_.mkString(,)) On Wed, Dec 24, 2014 at 8:12 PM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, On Wed, Dec 24, 2014 at 3:18 PM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: I want to convert a schemaRDD into RDD

SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD

2014-12-23 Thread Jerry Lam
Hi spark users, I'm trying to create external table using HiveContext after creating a schemaRDD and saving the RDD into a parquet file on hdfs. I would like to use the schema in the schemaRDD (rdd_table) when I create the external table. For example: rdd_table.saveAsParquetFile(/user/spark

RE: SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD

2014-12-23 Thread Cheng, Hao
@spark.apache.org Subject: SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD Hi spark users, I'm trying to create external table using HiveContext after creating a schemaRDD and saving the RDD into a parquet file on hdfs. I would like to use the schema in the schemaRDD (rdd_table) when I create

SchemaRDD to RDD[String]

2014-12-23 Thread Hafiz Mujadid
Hi dears! I want to convert a schemaRDD into RDD of String. How can we do that? Currently I am doing like this which is not converting correctly no exception but resultant strings are empty here is my code def SchemaRDDToRDD( schemaRDD : SchemaRDD ) : RDD[ String ] = { var

Re: SchemaRDD to Hbase

2014-12-20 Thread Subacini B
Hi , Can someone help me , Any pointers would help. Thanks Subacini On Fri, Dec 19, 2014 at 10:47 PM, Subacini B subac...@gmail.com wrote: Hi All, Is there any API that can be used directly to write schemaRDD to HBase?? If not, what is the best way to write schemaRDD to HBase. Thanks

Re: SchemaRDD to Hbase

2014-12-20 Thread Alex Kamil
I'm using JDBCRDD https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.rdd.JdbcRDD + Hbase JDBC driver http://phoenix.apache.org/+ schemaRDD https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD make sure to use spark 1.2 On Sat, Dec 20

SchemaRDD to Hbase

2014-12-19 Thread Subacini B
Hi All, Is there any API that can be used directly to write schemaRDD to HBase?? If not, what is the best way to write schemaRDD to HBase. Thanks Subacini

Re: Adding a column to a SchemaRDD

2014-12-15 Thread Yanbo Liang
not need to make SchemaRDD manually. Because that jdata.select() return a SchemaRDD and you can operate on it directly. For example, the following code snippet will return a new SchemaRDD with longer Row: val t1 = jdata.select(Star(Node), 'seven.getField(mod) + 'eleven.getField(mod) as 'mod_sum

Re: SchemaRDD partition on specific column values?

2014-12-15 Thread Nitin Goyal
/SchemaRDD-partition-on-specific-column-values-tp20350p20623.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail

Re: SchemaRDD partition on specific column values?

2014-12-14 Thread Michael Armbrust
nitin2go...@gmail.com wrote: Can we take this as a performance improvement task in Spark-1.2.1? I can help contribute for this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350p20623.html Sent from

Re: Adding a column to a SchemaRDD

2014-12-14 Thread Tobias Pfeiffer
is a scala function, with no luck. Let's say I have a SchemaRDD with columns A, B, and C, and I want to add a new column, D, calculated using Utility.process(b, c), and I want (of course) to pass in the value B and C from each row, ending up with a new SchemaRDD with columns A, B, C, and D

Re: Adding a column to a SchemaRDD

2014-12-12 Thread Yanbo Liang
) import sqlContext._ val d1 = sc.parallelize(1 to 10).map { i = Person(i,i+1,i+2)} val d2 = d1.select('id, 'score, 'id + 'score) d2.foreach(println) 2014-12-12 14:11 GMT+08:00 Nathan Kronenfeld nkronenf...@oculusinfo.com: Hi, there. I'm trying to understand how to augment data in a SchemaRDD. I

Re: Adding a column to a SchemaRDD

2014-12-12 Thread Nathan Kronenfeld
(1) I understand about immutability, that's why I said I wanted a new SchemaRDD. (2) I specfically asked for a non-SQL solution that takes a SchemaRDD, and results in a new SchemaRDD with one new function. (3) The DSL stuff is a big clue, but I can't find adequate documentation for it What I'm

Re: SchemaRDD partition on specific column values?

2014-12-11 Thread nitin
Can we take this as a performance improvement task in Spark-1.2.1? I can help contribute for this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350p20623.html Sent from the Apache Spark User List mailing

Adding a column to a SchemaRDD

2014-12-11 Thread Nathan Kronenfeld
Hi, there. I'm trying to understand how to augment data in a SchemaRDD. I can see how to do it if can express the added values in SQL - just run SELECT *,valueCalculation AS newColumnName FROM table I've been searching all over for how to do this if my added value is a scala function

Re: Convert RDD[Map[String, Any]] to SchemaRDD

2014-12-08 Thread Yin Huai
/SPARK-4782 Jianshi On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD? I'm currently converting each Map to a JSON String and do JsonRDD.inferSchema. How about adding inferSchema support

Re: Convert RDD[Map[String, Any]] to SchemaRDD

2014-12-06 Thread Jianshi Huang
Hmm.. I've created a JIRA: https://issues.apache.org/jira/browse/SPARK-4782 Jianshi On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD? I'm currently converting each Map to a JSON String

Re: SchemaRDD partition on specific column values?

2014-12-05 Thread Michael Armbrust
this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350p20424.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e

SchemaRDD partition on specific column values?

2014-12-04 Thread nitin
on ID by preprocessing it (and then cache it). Thanks in Advance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: SchemaRDD partition on specific column values?

2014-12-04 Thread nitin
JOIN step) and improve overall performance? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350p20424.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: SchemaRDD + SQL , loading projection columns

2014-12-03 Thread Vishnusaran Ramaswamy
? Let me know if this can be done in a different way. Thanks you, Vishnu. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-SQL-loading-projection-columns-tp20189.html Sent from the Apache Spark User List mailing list archive at Nabble.com

How to create a new SchemaRDD which is not based on original SparkPlan?

2014-12-03 Thread Tim Chou
Hi All, My question is about lazy running mode for SchemaRDD, I guess. I know lazy mode is good, however, I still have this demand. For example, here is the first SchemaRDD, named result.(select * from table where num1 and num 4): results: org.apache.spark.sql.SchemaRDD = SchemaRDD[59] at RDD

SchemaRDD + SQL , loading projection columns

2014-12-02 Thread Vishnusaran Ramaswamy
.1001560.n3.nabble.com/SchemaRDD-SQL-loading-projection-columns-tp20189.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e

Standard SQL tool access to SchemaRDD

2014-12-02 Thread Jim Carroll
.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-tp20197.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Standard SQL tool access to SchemaRDD

2014-12-02 Thread Michael Armbrust
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-tp20197.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e

Re: Standard SQL tool access to SchemaRDD

2014-12-02 Thread Jim Carroll
Thanks! I'll give it a try. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-tp20197p20202.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Creating a SchemaRDD from an existing API

2014-12-01 Thread Niranda Perera
Hi Michael, About this new data source API, what type of data sources would it support? Does it have to be RDBMS necessarily? Cheers On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust mich...@databricks.com wrote: You probably don't need to create a new kind of SchemaRDD. Instead I'd

Re: Creating a SchemaRDD from an existing API

2014-12-01 Thread Michael Armbrust
, Nov 29, 2014 at 12:57 AM, Michael Armbrust mich...@databricks.com wrote: You probably don't need to create a new kind of SchemaRDD. Instead I'd suggest taking a look at the data sources API that we are adding in Spark 1.2. There is not a ton of documentation, but the test cases show how

Re: Creating a SchemaRDD from an existing API

2014-11-28 Thread Michael Armbrust
You probably don't need to create a new kind of SchemaRDD. Instead I'd suggest taking a look at the data sources API that we are adding in Spark 1.2. There is not a ton of documentation, but the test cases show how to implement the various interfaces https://github.com/apache/spark/tree/master

Creating a SchemaRDD from an existing API

2014-11-27 Thread Niranda Perera
Hi, I am evaluating Spark for an analytic component where we do batch processing of data using SQL. So, I am particularly interested in Spark SQL and in creating a SchemaRDD from an existing API [1]. This API exposes elements in a database as datasources. Using the methods allowed by this data

Re: Remapping columns from a schemaRDD

2014-11-26 Thread Daniel Haviv
Is there some place I can read more about it ? I can't find any reference. I actully want to flatten these structures and not return them from the UDF. Thanks, Daniel On Tue, Nov 25, 2014 at 8:44 PM, Michael Armbrust mich...@databricks.com wrote: Maps should just be scala maps, structs are

SchemaRDD compute function

2014-11-26 Thread Jörg Schad
Hi, I have a short question regarding the compute() of an SchemaRDD. For SchemaRDD the actual queryExecution seems to be triggered via collect(), while the compute triggers only the compute() of the parent and copies the data (Please correct me if I am wrong!). Is this compute() triggered at all

Re: SchemaRDD compute function

2014-11-26 Thread Michael Armbrust
takeOrdered, etc. On Wed, Nov 26, 2014 at 5:05 AM, Jörg Schad joerg.sc...@gmail.com wrote: Hi, I have a short question regarding the compute() of an SchemaRDD. For SchemaRDD the actual queryExecution seems to be triggered via collect(), while the compute triggers only the compute

Remapping columns from a schemaRDD

2014-11-25 Thread Daniel Haviv
Hi, I'm selecting columns from a json file, transform some of them and would like to store the result as a parquet file but I'm failing. This is what I'm doing: val jsonFiles=sqlContext.jsonFile(/requests.loading) jsonFiles.registerTempTable(jRequests) val clean_jRequests=sqlContext.sql(select

Re: Remapping columns from a schemaRDD

2014-11-25 Thread Michael Armbrust
Probably the easiest/closest way to do this would be with a UDF, something like: registerFunction(makeString, (s: Seq[String]) = s.mkString(,)) sql(SELECT *, makeString(c8) AS newC8 FROM jRequests) Although this does not modify a column, but instead appends a new column. Another more

Re: Remapping columns from a schemaRDD

2014-11-25 Thread Daniel Haviv
Thank you. How can I address more complex columns like maps and structs? Thanks again! Daniel On 25 בנוב׳ 2014, at 19:43, Michael Armbrust mich...@databricks.com wrote: Probably the easiest/closest way to do this would be with a UDF, something like: registerFunction(makeString, (s:

Re: Remapping columns from a schemaRDD

2014-11-25 Thread Michael Armbrust
Maps should just be scala maps, structs are rows inside of rows. If you wan to return a struct from a UDF you can do that with a case class. On Tue, Nov 25, 2014 at 10:25 AM, Daniel Haviv danielru...@gmail.com wrote: Thank you. How can I address more complex columns like maps and structs?

How to deal with BigInt in my case class for RDD = SchemaRDD convertion

2014-11-21 Thread Jianshi Huang
Hi, I got an error during rdd.registerTempTable(...) saying scala.MatchError: scala.BigInt Looks like BigInt cannot be used in SchemaRDD, is that correct? So what would you recommend to deal with it? Thanks, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http

  1   2   >