Re: specifing schema on dataframe

2017-02-05 Thread Sam Elamin
Ok thanks Micheal! Can I get an idea on where to start? Assuming I have the end schema and the current dataframe... How can I loop through it and create a new dataframe using the WithColumn? Am I iterating through the dataframe or the schema? I'm assuming it's easier to iterate through the

Re: Cannot read Hive Views in Spark SQL

2017-02-05 Thread KhajaAsmath Mohammed
Hi Khan, It didn't work in my case. used below code. View is already present in Hive but I cant read that in spark sql. Throwing exception that table not found sqlCtx.refreshTable("schema.hive_view") Thanks, Asmath On Sun, Feb 5, 2017 at 7:56 PM, vaquar khan wrote:

Re: Cannot read Hive Views in Spark SQL

2017-02-05 Thread vaquar khan
Hi Ashmath, Try refresh table // spark is an existing SparkSession spark.catalog.refreshTable("my_table") http://spark.apache.org/docs/latest/sql-programming-guide.html#metadata-refreshing Regards, Vaquar khan On Sun, Feb 5, 2017 at 7:19 PM, KhajaAsmath Mohammed <

Cannot read Hive Views in Spark SQL

2017-02-05 Thread KhajaAsmath Mohammed
Hi, I have a hive view which is basically set of select statements on some tables. I want to read the hive view and use hive builtin functions available in spark sql. I am not able to read that hive view in spark sql but can retreive data in hive shell. can't spark access hive views? Thanks,

Re: specifing schema on dataframe

2017-02-05 Thread Michael Armbrust
If you already have the expected schema, and you know that all numbers will always be formatted as strings in the input JSON, you could probably derive this list automatically. Wouldn't it be simpler to just regex replace the numbers to remove the > quotes? I think this is likely to be a slower

Re: specifing schema on dataframe

2017-02-05 Thread Sam Elamin
I see so for the connector I need to pass in an array/list of numerical columns? Wouldnt it be simpler to just regex replace the numbers to remove the quotes? Regards Sam On Sun, Feb 5, 2017 at 11:11 PM, Michael Armbrust wrote: > Specifying the schema when parsing

Re: specifing schema on dataframe

2017-02-05 Thread Michael Armbrust
Specifying the schema when parsing JSON will only let you pick between similar datatypes (i.e should this be a short, long float, double etc). It will not let you perform conversions like string <-> number. This has to be done with explicit casts after the data has been loaded. I think you can

Re: using an alternative slf4j implementation

2017-02-05 Thread Jacek Laskowski
Hi, Shading conflicting dependencies? Jacek On 5 Feb 2017 3:56 p.m., "Mendelson, Assaf" wrote: > Hi, > > Spark seems to explicitly use log4j. > > This means that if I use an alternative backend for my application (e.g. > ch.qos.logback) I have a conflict. > > Sure I

Invalid checkpoint file on spark 1.6.2

2017-02-05 Thread zitang qin
hello every one, I was running below code on spark 1.6.2 , could anyone help with the error message , much appreciated df_batch_rdd=hc.read.option("basePath", T_CF_Prefix).parquet(*tcfFolderString).rdd df_batch_rdd.cache() df_batch_rdd.checkpoint() print

Re: High Availability/DR options for Spark applications

2017-02-05 Thread Ashok Kumar
Hi, High Availability means that the system including Spark will carry on with minimal disruption in case of active component failure. DR or disaster recovery means total fail-over to another location with its own nodes. HDFS and Spark cluster Thanks   On Sunday, 5 February 2017, 20:15,

Re: specifing schema on dataframe

2017-02-05 Thread Sam Elamin
Thanks Micheal I've been spending the past few days researching this The problem is the generated json has double quotes on fields that are numbers because the producing datastore doesn't want to lose precision I can change the data type true but that would be on specific to a job rather than a

Re: specifing schema on dataframe

2017-02-05 Thread Michael Armbrust
-dev You can use withColumn to change the type after the data has been loaded . On Sat, Feb 4, 2017 at 6:22 AM, Sam Elamin

FileNotFoundException, while file is actually available

2017-02-05 Thread Evgenii Morozov
Hi, I see a lot of exceptions like the following during our machine learning pipeline calculation. Spark version 2.0.2. Sometimes it’s just few executors that fails with this message, but the job is successful. I’d appreciate any hint you might have. Thank you. 2017-02-05 07:56:47.022

Unsubscribe

2017-02-05 Thread satish saley
blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px #715FFA solid !important; padding-left:1ex !important; background-color:white !important; } Unsubscribe Sent from Yahoo Mail for iPhone

Re: spark architecture question -- Pleas Read

2017-02-05 Thread kuassi mensah
Apology in advance for injecting Oracle product in this discussion but I thought it might help address the requirements (as far as I understood these). We are looking into furnishing for Spark a new connector similar to the Oracle Datasource for Hadoop,

Re: Turning rows into columns

2017-02-05 Thread Koert Kuipers
since there is no key to group by and assemble records i would suggest to write this in RDD land and then convert to data frame. you can use sc.wholeTextFiles to process text files and create a state machine On Feb 4, 2017 16:25, "Paul Tremblay" wrote: I am using

Re: High Availability/DR options for Spark applications

2017-02-05 Thread Jacek Laskowski
Hi, I'm not very familiar with "High Availability/DR operations". Could you explain what it is? My very limited understanding of the phrase allows me to think that with YARN and cluster deploy mode you've failure recovery for free so when your drivers dies YARN will attempt to resurrect it a few

Re: Spark 2 + Java + UDF + unknown return type...

2017-02-05 Thread Koert Kuipers
A UDF that does not return a single type is not supported. and spark has no concept of union types. On Feb 2, 2017 16:05, "Jean Georges Perrin" wrote: Hi fellow Sparkans, I am building a UDF (in Java) that can return various data types, basically the signature of the function

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-05 Thread Debasish Das
Hi Aseem, Due to production deploy, we did not upgrade to 2.0 but that's critical item on our list. For exposing models out of PipelineModel, let me look into the ML tasks...we should add it since dataframe should not be must for model scoring...many times model are scored on api or streaming

using an alternative slf4j implementation

2017-02-05 Thread Mendelson, Assaf
Hi, Spark seems to explicitly use log4j. This means that if I use an alternative backend for my application (e.g. ch.qos.logback) I have a conflict. Sure I can exclude logback but that means my application cannot use our internal tools. Is there a way to use logback as a backend logging while

High Availability/DR options for Spark applications

2017-02-05 Thread Ashok Kumar
Hello, What are the practiced High Availability/DR operations for Spark cluster at the moment. I am specially interested if YARN is used as the resource manager. Thanks

Re: spark architecture question -- Pleas Read

2017-02-05 Thread Mich Talebzadeh
agreed. The best option is to ingest to ingesting tables in Oracle. Many people ingest into main Oracle table which is wrong design in my opinion. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: spark architecture question -- Pleas Read

2017-02-05 Thread Jörn Franke
You should see an exception and your job fails by default after I think 4 attempts. If you see an exception you may want to clean the staging table for loading and reload again. > On 4 Feb 2017, at 09:06, Mich Talebzadeh wrote: > > Ingesting from Hive tables back