Re: Schema evolution in tables

2017-01-13 Thread sim
There is not automated solution right now. You have to issue manual ALTER TABLE commands, which works for adding top-level columns but gets tricky if you are adding a field in a deeply nested struct. Hopefully, the issue will be fixed in 2.2 because work has started on

Re: "Ambiguous references" to a field set in a partitioned table AND the data

2015-12-18 Thread sim
See https://issues.apache.org/jira/browse/SPARK-7301 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ambiguous-references-to-a-field-set-in-a-partitioned-table-AND-the-data-tp22325p25740.html Sent from the Apache Spark User List mailing list archive at

Re: Class cast exception : Spark 1.5

2015-09-21 Thread sim
You likely need to add the Cassandra connector JAR to spark.jars so it is available to the executors. Hope this helps, Sim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Class-cast-exception-Spark-1-5-tp24732p24753.html Sent from the Apache Spark User

Writing a DataFrame as compressed JSON

2015-08-10 Thread sim
DataFrameReader.json() can handle gzipped JSONlines files automatically but there doesn't seem to be a way to get DataFrameWriter.json() to write compressed JSONlines files. Uncompressed JSONlines is a very expensive from an I/O standpoint because field names are included in every record. Is

Re: Schema change on Spark Hive (Parquet file format) table not working

2015-08-08 Thread sim
Yes, I've found a number of problems with metadata management in Spark SQL. One core issue is SPARK-9764 https://issues.apache.org/jira/browse/SPARK-9764 . Related issues are SPARK-9342 https://issues.apache.org/jira/browse/SPARK-9342 , SPARK-9761

Re: Spark inserting into parquet files with different schema

2015-08-08 Thread sim
Adam, did you find a solution for this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Schema evolution in tables

2015-07-26 Thread sim
The schema merging http://spark.apache.org/docs/latest/sql-programming-guide.html#schema-merging section of the Spark SQL documentation shows an example of schema evolution in a partitioned table. Is this functionality only available when creating a Spark SQL table?

Re: Insert data into a table

2015-07-25 Thread sim
I don't think INSERT INTO is supported. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Insert-data-into-a-table-tp21898p23990.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Cleanup when tasks generate errors

2015-07-17 Thread sim
for dealing with these types of cleanup failures? Do they tend to come in known varieties? Thanks, Sim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cleanup-when-tasks-generate-errors-tp23890.html Sent from the Apache Spark User List mailing list archive

Re: Spark SQL groupby timestamp

2015-07-03 Thread sim
becomes very simple: select 3600*floor(timestamp/3600) as timestamp, count(error) as errors,from logsgroup by 3600*floor(timestamp/3600) Hope this helps./Sim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-groupby-timestamp-tp23470p23615.html Sent

Aggregating the same column multiple times

2015-07-02 Thread sim
What is the rationale for not allowing the same column in a GroupedData to be aggregated more than once using agg, especially when the method signature def agg(aggExpr: (String, String), aggExprs: (String, String)*) allows passing something like agg(x - sum, x =avg)? -- View this message in

1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread sim
showing how spark-shell is started. Should the 1.4.0 spark-shell be started with different options to avoid this problem? Thanks, Sim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html Sent from