Hive on Spark Job Monitoring
Hi Team, I wanted to understand how Hive on Spark actually maps to Spark jobs underneath triggered by Hive. AFAIK each Hive query would trigger a new Spark job. But this was contradicted by someone and wanted to confirm what is the real design implementation. Please let me know if there is reference/design doc which explains this or if someone knows about this can answer here. Thanks, Ninad
Making withColumn nullable
HI Team, When I add a column to my data frame using withColumn and assign some value, it automatically creates the schema with this column to be not nullable. My final Hive table schema where I want to insert it has this column to be nullable and hence throws an error when I try to save. Is there a way of making the column I add with withColumn method to be set to nullable? Thanks, Ninad
Creating UUID using SparksSQL
Hi Team, Is there a standard way of generating a unique id for each row in from Spark SQL. I am looking for functionality similar to UUID generation in hive. Let me know if you need any additional information. Thanks, Ninad
[DataFrames] map function - 2.0
Hi Team, When going through Dataset class for Spark 2.0 it comes across that both overloaded map functions with encoder and without are marked as experimental. Is there a reason and issues that developers whould be aware of when using this for production applications. Also is there a "non-experimental" way of using map function on Dataframe in Spark 2.0 Thanks, Ninad
Re: [Spark-SQL] collect_list() support for nested collection
Exactly what I was looking for. Thank you so much!! On Tue, Dec 13, 2016 at 6:15 PM Michael Armbrust wrote: > Yes > > > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/4464261896877850/2840265927289860/latest.html > > On Tue, Dec 13, 2016 at 10:43 AM, Ninad Shringarpure > wrote: > > > Hi Team, > > > > > > > > > > > > > > > > > > Does Spark 2.0 support non-primitive types in collect_list for inserting > nested collections? > Would appreciate any references or samples. > > Thanks, > Ninad > > > > > > > > > >
Fwd: [Spark-SQL] collect_list() support for nested collection
Hi Team, Does Spark 2.0 support non-primitive types in collect_list for inserting nested collections? Would appreciate any references or samples. Thanks, Ninad
Unsubscribe
Unsubscribe
Fwd: jdbcRDD for data ingestion from RDBMS
Hi Team, One of my client teams is trying to see if they can use Spark to source data from RDBMS instead of Sqoop. Data would be substantially large in the order of billions of records. I am not sure reading the documentations whether jdbcRDD by design is going to be able to scale well for this amount of data. Plus some in-built features provided in Sqoop like --direct might give better performance than straight up jdbc. My primary question to this group is if it is advisable to use jdbcRDD for data sourcing and can we expect it to scale. Also performance wise how would it compare to Sqoop. Please let me know your thoughts and any pointers if anyone in the group has already implemented it. Thanks, Ninad