There is not automated solution right now. You have to issue manual ALTER
TABLE commands, which works for adding top-level columns but gets tricky if
you are adding a field in a deeply nested struct.
Hopefully, the issue will be fixed in 2.2 because work has started on
See https://issues.apache.org/jira/browse/SPARK-7301
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Ambiguous-references-to-a-field-set-in-a-partitioned-table-AND-the-data-tp22325p25740.html
Sent from the Apache Spark User List mailing list archive at
You likely need to add the Cassandra connector JAR to spark.jars so it is
available to the executors.
Hope this helps,
Sim
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Class-cast-exception-Spark-1-5-tp24732p24753.html
Sent from the Apache Spark User
DataFrameReader.json() can handle gzipped JSONlines files automatically but
there doesn't seem to be a way to get DataFrameWriter.json() to write
compressed JSONlines files. Uncompressed JSONlines is a very expensive from
an I/O standpoint because field names are included in every record.
Is
Yes, I've found a number of problems with metadata management in Spark SQL.
One core issue is SPARK-9764
https://issues.apache.org/jira/browse/SPARK-9764 . Related issues are
SPARK-9342 https://issues.apache.org/jira/browse/SPARK-9342 , SPARK-9761
Adam, did you find a solution for this?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-inserting-into-parquet-files-with-different-schema-tp20706p24181.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
The schema merging
http://spark.apache.org/docs/latest/sql-programming-guide.html#schema-merging
section of the Spark SQL documentation shows an example of schema evolution
in a partitioned table.
Is this functionality only available when creating a Spark SQL table?
I don't think INSERT INTO is supported.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Insert-data-into-a-table-tp21898p23990.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
for dealing with these types of cleanup
failures? Do they tend to come in known varieties?
Thanks,
Sim
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Cleanup-when-tasks-generate-errors-tp23890.html
Sent from the Apache Spark User List mailing list archive
becomes
very simple:
select 3600*floor(timestamp/3600) as timestamp, count(error) as
errors,from logsgroup by 3600*floor(timestamp/3600)
Hope this helps./Sim
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-groupby-timestamp-tp23470p23615.html
Sent
What is the rationale for not allowing the same column in a GroupedData to be
aggregated more than once using agg, especially when the method signature
def agg(aggExpr: (String, String), aggExprs: (String, String)*) allows
passing something like agg(x - sum, x =avg)?
--
View this message in
showing how spark-shell is started.
Should the 1.4.0 spark-shell be started with different options to avoid this
problem?
Thanks,
Sim
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html
Sent from
12 matches
Mail list logo