case 1 => val1
> case 2 => val2
> //... cases up to 26
> }
> }
>
> hence expecting an approach to convert SchemaRDD to RDD without using
> Tuple or Case Class as we have restrictions in Scala 2.10
>
> Regards
> Satish Chandra
>
Hi All,
To convert SchemaRDD to RDD below snipped is working if SQL statement has
columns in a row are less than 22 as per tuple restriction
rdd.map(row => row.toString)
But if SQL statement has columns more than 22 than the above snippet will
error "*object Tuple27 is not a member of
Have you seen this thread ?
http://search-hadoop.com/m/q3RTt9YBFr17u8j8=Scala+Limitation+Case+Class+definition+with+more+than+22+arguments
On Fri, Oct 16, 2015 at 7:41 AM, satish chandra j <jsatishchan...@gmail.com>
wrote:
> Hi All,
> To convert SchemaRDD to RDD below snipped is wo
(that: Any): Boolean = that.isInstanceOf[MyRecord]
def productArity: Int = 26 // example value, it is amount of arguments
def productElement(n: Int): Serializable = n match {
case 1 => val1
case 2 => val2
//... cases up to 26
}
}
hence expecting an approach to convert
Hi,
I have a DStream which is a stream of RDD[String].
How can I pass a DStream to sqlContext.jsonRDD and work with it as a DF ?
Thank you.
Daniel
, September 29, 2015 at 5:09 PM
To: Daniel Haviv, user
Subject: RE: Converting a DStream to schemaRDD
Something like:
dstream.foreachRDD { rdd =>
val df = sqlContext.read.json(rdd)
df.select(…)
}
https://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operati
ork it as if it were a standard RDD dataset.
Ewan
From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com]
Sent: 29 September 2015 15:03
To: user <user@spark.apache.org>
Subject: Converting a DStream to schemaRDD
Hi,
I have a DStream which is a stream of RDD[String].
How can I pass
-in-spark-sql/
2015-06-23 16:12 GMT-07:00 Richard Catlin richard.m.cat...@gmail.com:
How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a
column? Is there an example? Will this store as a nested parquet file?
Thanks.
Richard Catlin
I wrote a brief howto on building nested records in spark and storing them
in parquet here:
http://www.congiu.com/creating-nested-data-parquet-in-spark-sql/
2015-06-23 16:12 GMT-07:00 Richard Catlin richard.m.cat...@gmail.com:
How do I create a DataFrame(SchemaRDD) with a nested array of Rows
How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a
column? Is there an example? Will this store as a nested parquet file?
Thanks.
Richard Catlin
:
How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a
column? Is there an example? Will this store as a nested parquet file?
Thanks.
Richard Catlin
the contents of an RDD/SchemaRDD on the
screen in a formatted way? For example, say I want to take() the first 30
lines/rows in an *RDD and present them in a readable way on the screen so
that I can see what's missing or invalid. Obviously, I'm just trying to
sample the results in a readable way
Depending on your spark version, you can convert schemaRDD to a dataframe
and then use .show()
On 30 May 2015 10:33, Minnow Noir minnown...@gmail.com wrote:
Im trying to debug query results inside spark-shell, but finding it
cumbersome to save to file and then use file system utils to explore
Hi all, following the
import com.datastax.spark.connector.SelectableColumnRef;
import com.datastax.spark.connector.japi.CassandraJavaUtil;
import org.apache.spark.sql.SchemaRDD;
import static com.datastax.spark.connector.util.JavaApiHelper.toScalaSeq;
import scala.collection.Seq;
SchemaRDD
into one long list
of columns as I would be able to find some weird stuff by doing that. So my
question is the following:
1. Does SchemaRDD support something like multi value attributes? It might look
like and array of values that lives in just one
column. Although it’s not clear how I’d aggregate
Hi experts!
I would like to know is there anyway to store schemaRDD to cassandra?
if yes then how to store in existing cassandra column family and new column
family?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/saving-schemaRDD-to-cassandra
assumptions about partitioning.
On Mon, Mar 23, 2015 at 10:22 AM, Stephen Boesch java...@gmail.com wrote:
Is there a way to take advantage of the underlying datasource partitions
when generating a DataFrame/SchemaRDD via catalyst? It seems from the sql
module that the only options
Is there a way to take advantage of the underlying datasource partitions
when generating a DataFrame/SchemaRDD via catalyst? It seems from the sql
module that the only options are RangePartitioner and HashPartitioner - and
further that those are selected automatically by the code
Hi All,
I was wondering how rdd transformation work on schemaRDDs. Is there a way
to force the rdd transform to keep the schemaRDD types or do I need to
recreate the schemaRDD by applying the applySchema method?
Currently what I have is an array of SchemaRDDs and I just want to do a
union
Looks like if I use unionAll this works.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Using-regular-rdd-transforms-on-schemaRDD-tp22105p22107.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
from a parquet file, stored in a schemaRDD
[7654321,2015-01-01 00:00:00.007,0.49,THU]
Since, in spark version 1.1.0, parquet format doesn't support saving
timestamp valuues, I have saved the timestamp data as string. Can you please
tell me how to iterate over the data in this schema RDD to retrieve
Spark Version - 1.1.0
Scala - 2.10.4
I have loaded following type data from a parquet file, stored in a schemaRDD
[7654321,2015-01-01 00:00:00.007,0.49,THU]
Since, in spark version 1.1.0, parquet format doesn't support saving
timestamp valuues, I have saved the timestamp data as string. Can you
, registerTempTable is just a Map[String, SchemaRDD]
insertion, nothing that would be measurable. But there are no
distributed/RDD operations involved, I think.
Tobias
transformers classes for feature extraction, and If I need to save the
input and maybe output SchemaRDD of the transform function in every
transformer, this may not very efficient.
Thanks
On Tue, Mar 10, 2015 at 8:20 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
On Tue, Mar 10, 2015 at 2:13 PM
Hi,
On Tue, Mar 10, 2015 at 2:13 PM, Cesar Flores ces...@gmail.com wrote:
I am new to the SchemaRDD class, and I am trying to decide in using SQL
queries or Language Integrated Queries (
https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD
).
Can someone
I am new to the SchemaRDD class, and I am trying to decide in using SQL
queries or Language Integrated Queries (
https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD
).
Can someone tell me what is the main difference between the two approaches,
besides using
They should have the same performance, as they are compiled down to the
same execution plan.
Note that starting in Spark 1.3, SchemaRDD is renamed DataFrame:
https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html
On Tue, Mar 10, 2015 at 2:13
Hi Wush,
I'm CC'ing user@spark.apache.org (which is the new list) and BCC'ing
u...@spark.incubator.apache.org.
In Spark 1.3, schemaRDD is in fact being renamed to DataFrame (see:
https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html
Dear all,
I am a new spark user from R.
After exploring the schemaRDD, I notice that it is similar to data.frame.
Is there a feature like `model.matrix` in R to convert schemaRDD to model
matrix automatically according to the type without explicitly converting
them one by one?
Thanks,
Wush
Hi, in the roadmap of Spark in 2015 (link:
http://files.meetup.com/3138542/Spark%20in%202015%20Talk%20-%20Wendell.p
ptx), I saw SchemaRDD is designed to be the basis of BOTH Spark
Streaming and Spark SQL.
My question is: what's the typical usage of SchemaRDD in a Spark
Streaming application
Hi All,
I am currently trying to build out a spark job that would basically convert
a csv file into parquet. From what I have seen it looks like spark sql is
the way to go and how I would go about this would be to load in the csv file
into an RDD and convert it into a schemaRDD by injecting
seen it looks like spark sql is
the way to go and how I would go about this would be to load in the csv
file
into an RDD and convert it into a schemaRDD by injecting in the schema via
a
case class.
What I want to avoid is hard coding in the case class itself. I want to
reuse this job
I've been searching around and see others have asked similar questions.
Given a schemaRDD I extract a restless that contains numbers, both Int and
Doubles. How do I construct a RDD[Vector]? In 1.2 I wrote the results to a
textile and then read them back in splitting them with some code I found
have asked similar questions.
Given a schemaRDD I extract a restless that contains numbers, both Int and
Doubles. How do I construct a RDD[Vector]? In 1.2 I wrote the results to a
textile and then read them back in splitting them with some code I found in
a ML book on Spark Analytics
this method:
/**
* Returns the content of the [[DataFrame]] as an [[RDD]] of [[Row]]s.
* @group rdd
*/
def rdd: RDD[Row] = {
FYI
On Sun, Feb 22, 2015 at 11:51 AM, stephane.collot
stephane.col...@gmail.com wrote:
Hi Michael,
I think that the feature (convert a SchemaRDD
, 2015 at 11:51 AM, stephane.collot stephane.col...@gmail.com
wrote:
Hi Michael,
I think that the feature (convert a SchemaRDD to a structured class RDD) is
now available. But I didn't understand in the PR how exactly to do this.
Can
you give an example or doc links?
Best regards
Hi Michael,
I think that the feature (convert a SchemaRDD to a structured class RDD) is
now available. But I didn't understand in the PR how exactly to do this. Can
you give an example or doc links?
Best regards
--
View this message in context:
http://apache-spark-user-list.1001560.n3
Hi,
can some one guide how to get SQL Exception trapped for query executed using
SchemaRDD,
i mean suppose table not found
thanks in advance,
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-get-SchemaRDD-SQL-exceptions-i-e-table-not-found
-in-SchemaRDD-tp21555p21613.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi All,
I have a use case where I have cached my schemaRDD and I want to launch
executors just on the partition which I know of (prime use-case of
PartitionPruningRDD).
I tried something like following :-
val partitionIdx = 2
val schemaRdd = hiveContext.table(myTable) //myTable is cached
Why don't you just map rdd's rows to lines and then call saveAsTextFile()?
On 3.2.2015. 11:15, Hafiz Mujadid wrote:
I want to write whole schemardd to single in hdfs but facing following
exception
rg.apache.hadoop.ipc.RemoteException
Hi,
Any thoughts ?
Thanks,
On Sun, Feb 1, 2015 at 12:26 PM, Manoj Samel manojsamelt...@gmail.com
wrote:
Spark 1.2
SchemaRDD has schema with decimal columns created like
x1 = new StructField(a, DecimalType(14,4), true)
x2 = new StructField(b, DecimalType(14,4), true)
Registering as SQL
I want to write whole schemardd to single in hdfs but facing following
exception
rg.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /test/data/data1.csv (inode 402042): File does not exist. Holder
DFSClient_NONMAPREDUCE_-564238432_57
Spark 1.2
SchemaRDD has schema with decimal columns created like
x1 = new StructField(a, DecimalType(14,4), true)
x2 = new StructField(b, DecimalType(14,4), true)
Registering as SQL Temp table and doing SQL queries on these columns ,
including SUM etc. works fine, so the schema Decimal does
I think I found the issue causing it.
I was calling schemaRDD.coalesce(n).saveAsParquetFile to reduce the number
of partitions in parquet file - in which case the stack trace happens.
If I compress the partitions before creating schemaRDD then the
schemaRDD.saveAsParquetFile call works
Hi,
I am getting a stack overflow error when querying a schemardd comprised of
parquet files. This is (part of) the stack trace:
Caused by: java.lang.StackOverflowError
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at
scala.collection.TraversableOnce
To: Nathan nathan.mccar...@quantium.com.au
mailto:nathan.mccar...@quantium.com.au, Michael Armbrust
mich...@databricks.com mailto:mich...@databricks.com
Cc: user@spark.apache.org mailto:user@spark.apache.org
user@spark.apache.org mailto:user@spark.apache.org
Subject: Re: SparkSQL schemaRDD
: Monday, 12 January 2015 1:21 am
To: Nathan nathan.mccar...@quantium.com.au, Michael Armbrust
mich...@databricks.com
Cc: user@spark.apache.org user@spark.apache.org
Subject: Re: SparkSQL schemaRDD MapPartitions calls - performance
issues - columnar formats?
On 1/11/15 1:40 PM, Nathan McCarthy
@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkSQL schemaRDD MapPartitions calls - performance issues -
columnar formats?
On 1/11/15 1:40 PM, Nathan McCarthy wrote:
Thanks Cheng Michael! Makes sense. Appreciate the tips!
Idiomatic scala isn't performant. I’ll
@gmail.com
Cc: Nathan nathan.mccar...@quantium.com.au
mailto:nathan.mccar...@quantium.com.au, user@spark.apache.org
mailto:user@spark.apache.org user@spark.apache.org
mailto:user@spark.apache.org
Subject: Re: SparkSQL schemaRDD MapPartitions calls - performance
issues - columnar formats?
The other
@spark.apache.org
Subject: Re: SparkSQL schemaRDD MapPartitions calls - performance issues -
columnar formats?
The other thing to note here is that Spark SQL defensively copies rows when we
switch into user code. This probably explains the difference between 1 2.
The difference between 1 3
mentioned below).
Now this takes around ~49 seconds… Even though test1 table is 100%
cached. The number of partitions remains the same…
Now if I create a simple RDD of a case class HourSum(hour: Int, qty:
Double, sales: Double)
Convert the SchemaRDD;
val rdd = sqlC.sql(select * from test1
remains the same…
Now if I create a simple RDD of a case class HourSum(hour: Int, qty:
Double, sales: Double)
Convert the SchemaRDD;
val rdd = sqlC.sql(select * from test1).map{ r =
HourSum(r.getInt(1), r.getDouble(7), r.getDouble(8)) }.cache()
//cache all the data
rdd.count()
Then run basically
Any ideas? :)
From: Nathan
nathan.mccar...@quantium.com.aumailto:nathan.mccar...@quantium.com.au
Date: Wednesday, 7 January 2015 2:53 pm
To: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: SparkSQL schemaRDD MapPartitions calls
) = (a._1 + b._1, a._2 + b._2)).collect().foreach(println)
Now this takes around ~49 seconds… Even though test1 table is 100% cached. The
number of partitions remains the same…
Now if I create a simple RDD of a case class HourSum(hour: Int, qty: Double,
sales: Double)
Convert the SchemaRDD;
val rdd
...@preferred.jp wrote:
Hi,
I have a SchemaRDD where I want to add a column with a value that is
computed from the rest of the row. As the computation involves a
network operation and requires setup code, I can't use
SELECT *, myUDF(*) FROM rdd,
but I wanted to use a combination of:
- get schema
Hi Michael,
On Tue, Jan 6, 2015 at 3:43 PM, Michael Armbrust mich...@databricks.com
wrote:
Oh sorry, I'm rereading your email more carefully. Its only because you
have some setup code that you want to amortize?
Yes, exactly that.
Concerning the docs, I'd be happy to contribute, but I don't
support for partitions in general...
We do support for Hive TGFs though and we could possibly add better scala
syntax for this concept or something else.
On Mon, Jan 5, 2015 at 9:52 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
I have a SchemaRDD where I want to add a column with a value
Hi,
I have a SchemaRDD where I want to add a column with a value that is
computed from the rest of the row. As the computation involves a
network operation and requires setup code, I can't use
SELECT *, myUDF(*) FROM rdd,
but I wanted to use a combination of:
- get schema of input SchemaRDD
) and see if you get
anything
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-to-RDD-String-tp20846p20910.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
...@gmail.com]
*Sent:* Wednesday, December 24, 2014 4:26 AM
*To:* user@spark.apache.org
*Subject:* SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD
Hi spark users,
I'm trying to create external table using HiveContext after creating a
schemaRDD and saving the RDD into a parquet file on hdfs.
I would
Hi,
On Wed, Dec 24, 2014 at 3:18 PM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:
I want to convert a schemaRDD into RDD of String. How can we do that?
Currently I am doing like this which is not converting correctly no
exception but resultant strings are empty
here is my code
Hehe
You might also try the following, which I think is equivalent:
schemaRDD.map(_.mkString(,))
On Wed, Dec 24, 2014 at 8:12 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
On Wed, Dec 24, 2014 at 3:18 PM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:
I want to convert a schemaRDD into RDD
Hi spark users,
I'm trying to create external table using HiveContext after creating a
schemaRDD and saving the RDD into a parquet file on hdfs.
I would like to use the schema in the schemaRDD (rdd_table) when I create
the external table.
For example:
rdd_table.saveAsParquetFile(/user/spark
@spark.apache.org
Subject: SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD
Hi spark users,
I'm trying to create external table using HiveContext after creating a
schemaRDD and saving the RDD into a parquet file on hdfs.
I would like to use the schema in the schemaRDD (rdd_table) when I create
Hi dears!
I want to convert a schemaRDD into RDD of String. How can we do that?
Currently I am doing like this which is not converting correctly no
exception but resultant strings are empty
here is my code
def SchemaRDDToRDD( schemaRDD : SchemaRDD ) : RDD[ String ] = {
var
Hi ,
Can someone help me , Any pointers would help.
Thanks
Subacini
On Fri, Dec 19, 2014 at 10:47 PM, Subacini B subac...@gmail.com wrote:
Hi All,
Is there any API that can be used directly to write schemaRDD to HBase??
If not, what is the best way to write schemaRDD to HBase.
Thanks
I'm using JDBCRDD
https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.rdd.JdbcRDD
+ Hbase JDBC driver http://phoenix.apache.org/+ schemaRDD
https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD
make sure to use spark 1.2
On Sat, Dec 20
Hi All,
Is there any API that can be used directly to write schemaRDD to HBase??
If not, what is the best way to write schemaRDD to HBase.
Thanks
Subacini
not need to make SchemaRDD
manually.
Because that jdata.select() return a SchemaRDD and you can operate on it
directly.
For example, the following code snippet will return a new SchemaRDD with
longer Row:
val t1 = jdata.select(Star(Node), 'seven.getField(mod) +
'eleven.getField(mod) as 'mod_sum
/SchemaRDD-partition-on-specific-column-values-tp20350p20623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail
nitin2go...@gmail.com wrote:
Can we take this as a performance improvement task in Spark-1.2.1? I can
help
contribute for this.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350p20623.html
Sent from
is a
scala function, with no luck.
Let's say I have a SchemaRDD with columns A, B, and C, and I want to add a
new column, D, calculated using Utility.process(b, c), and I want (of
course) to pass in the value B and C from each row, ending up with a new
SchemaRDD with columns A, B, C, and D
)
import sqlContext._
val d1 = sc.parallelize(1 to 10).map { i = Person(i,i+1,i+2)}
val d2 = d1.select('id, 'score, 'id + 'score)
d2.foreach(println)
2014-12-12 14:11 GMT+08:00 Nathan Kronenfeld nkronenf...@oculusinfo.com:
Hi, there.
I'm trying to understand how to augment data in a SchemaRDD.
I
(1) I understand about immutability, that's why I said I wanted a new
SchemaRDD.
(2) I specfically asked for a non-SQL solution that takes a SchemaRDD, and
results in a new SchemaRDD with one new function.
(3) The DSL stuff is a big clue, but I can't find adequate documentation
for it
What I'm
Can we take this as a performance improvement task in Spark-1.2.1? I can help
contribute for this.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350p20623.html
Sent from the Apache Spark User List mailing
Hi, there.
I'm trying to understand how to augment data in a SchemaRDD.
I can see how to do it if can express the added values in SQL - just run
SELECT *,valueCalculation AS newColumnName FROM table
I've been searching all over for how to do this if my added value is a
scala function
/SPARK-4782
Jianshi
On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
I'm currently converting each Map to a JSON String and do
JsonRDD.inferSchema.
How about adding inferSchema support
Hmm..
I've created a JIRA: https://issues.apache.org/jira/browse/SPARK-4782
Jianshi
On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
I'm currently converting each Map to a JSON String
this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350p20424.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e
on ID by
preprocessing it (and then cache it).
Thanks in Advance
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
JOIN step) and improve overall performance?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350p20424.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
?
Let me know if this can be done in a different way.
Thanks you,
Vishnu.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-SQL-loading-projection-columns-tp20189.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi All,
My question is about lazy running mode for SchemaRDD, I guess. I know lazy
mode is good, however, I still have this demand.
For example, here is the first SchemaRDD, named result.(select * from table
where num1 and num 4):
results: org.apache.spark.sql.SchemaRDD =
SchemaRDD[59] at RDD
.1001560.n3.nabble.com/SchemaRDD-SQL-loading-projection-columns-tp20189.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e
.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-tp20197.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-tp20197.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e
Thanks! I'll give it a try.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-tp20197p20202.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi Michael,
About this new data source API, what type of data sources would it support?
Does it have to be RDBMS necessarily?
Cheers
On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust mich...@databricks.com
wrote:
You probably don't need to create a new kind of SchemaRDD. Instead I'd
, Nov 29, 2014 at 12:57 AM, Michael Armbrust mich...@databricks.com
wrote:
You probably don't need to create a new kind of SchemaRDD. Instead I'd
suggest taking a look at the data sources API that we are adding in Spark
1.2. There is not a ton of documentation, but the test cases show how
You probably don't need to create a new kind of SchemaRDD. Instead I'd
suggest taking a look at the data sources API that we are adding in Spark
1.2. There is not a ton of documentation, but the test cases show how to
implement the various interfaces
https://github.com/apache/spark/tree/master
Hi,
I am evaluating Spark for an analytic component where we do batch
processing of data using SQL.
So, I am particularly interested in Spark SQL and in creating a SchemaRDD
from an existing API [1].
This API exposes elements in a database as datasources. Using the methods
allowed by this data
Is there some place I can read more about it ? I can't find any reference.
I actully want to flatten these structures and not return them from the UDF.
Thanks,
Daniel
On Tue, Nov 25, 2014 at 8:44 PM, Michael Armbrust mich...@databricks.com
wrote:
Maps should just be scala maps, structs are
Hi,
I have a short question regarding the compute() of an SchemaRDD.
For SchemaRDD the actual queryExecution seems to be triggered via
collect(), while the compute triggers only the compute() of the parent and
copies the data (Please correct me if I am wrong!).
Is this compute() triggered at all
takeOrdered, etc.
On Wed, Nov 26, 2014 at 5:05 AM, Jörg Schad joerg.sc...@gmail.com wrote:
Hi,
I have a short question regarding the compute() of an SchemaRDD.
For SchemaRDD the actual queryExecution seems to be triggered via
collect(), while the compute triggers only the compute
Hi,
I'm selecting columns from a json file, transform some of them and would
like to store the result as a parquet file but I'm failing.
This is what I'm doing:
val jsonFiles=sqlContext.jsonFile(/requests.loading)
jsonFiles.registerTempTable(jRequests)
val clean_jRequests=sqlContext.sql(select
Probably the easiest/closest way to do this would be with a UDF, something
like:
registerFunction(makeString, (s: Seq[String]) = s.mkString(,))
sql(SELECT *, makeString(c8) AS newC8 FROM jRequests)
Although this does not modify a column, but instead appends a new column.
Another more
Thank you.
How can I address more complex columns like maps and structs?
Thanks again!
Daniel
On 25 בנוב׳ 2014, at 19:43, Michael Armbrust mich...@databricks.com wrote:
Probably the easiest/closest way to do this would be with a UDF, something
like:
registerFunction(makeString, (s:
Maps should just be scala maps, structs are rows inside of rows. If you
wan to return a struct from a UDF you can do that with a case class.
On Tue, Nov 25, 2014 at 10:25 AM, Daniel Haviv danielru...@gmail.com
wrote:
Thank you.
How can I address more complex columns like maps and structs?
Hi,
I got an error during rdd.registerTempTable(...) saying scala.MatchError:
scala.BigInt
Looks like BigInt cannot be used in SchemaRDD, is that correct?
So what would you recommend to deal with it?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http
1 - 100 of 198 matches
Mail list logo