Re: Orc predicate pushdown with Spark Sql

2017-10-24 Thread Jörn Franke
Well the meta information is in the file so I am not surprised that it reads the file, but it should not read all the content, which is probably also not happening. > On 24. Oct 2017, at 18:16, Siva Gudavalli > wrote: > > > Hello, > > I have an update

Re: spark session jdbc performance

2017-10-24 Thread lucas.g...@gmail.com
Sorry, I meant to say: "That code looks SANE to me" Assuming that you're seeing the query running partitioned as expected then you're likely configured with one executor. Very easy to check in the UI. Gary Lucas On 24 October 2017 at 16:09, lucas.g...@gmail.com wrote: >

Re: Spark streaming for CEP

2017-10-24 Thread lucas.g...@gmail.com
This looks really interesting, thanks for linking! Gary Lucas On 24 October 2017 at 15:06, Mich Talebzadeh wrote: > Great thanks Steve > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >

Null array of cols

2017-10-24 Thread Mohit Anchlia
I am trying to understand the best way to handle the scenario where null array "[]" is passed. Can somebody suggest if there is a way to filter out such records. I've tried numerous things including using dataframe.head().isEmpty but pyspark doesn't recognize isEmpty even though I see it in the

Re: spark session jdbc performance

2017-10-24 Thread lucas.g...@gmail.com
Did you check the query plan / check the UI? That code looks same to me. Maybe you've only configured for one executor? Gary On Oct 24, 2017 2:55 PM, "Naveen Madhire" wrote: > > Hi, > > > > I am trying to fetch data from Oracle DB using a subquery and experiencing >

Re: Spark streaming for CEP

2017-10-24 Thread Mich Talebzadeh
Great thanks Steve Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk.

Re: Spark streaming for CEP

2017-10-24 Thread Stephen Boesch
Hi Mich, the github link has a brief intro - including a link to the formal docs http://logisland.readthedocs.io/en/latest/index.html . They have an architectural overview, developer guide, tutorial, and pretty comprehensive api docs. 2017-10-24 13:31 GMT-07:00 Mich Talebzadeh

spark session jdbc performance

2017-10-24 Thread Naveen Madhire
Hi, I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues. Below is the query I am using, *Using Spark 2.0.2* *val *df = spark_session.read.format(*"jdbc"*) .option(*"driver"*,*"*oracle.jdbc.OracleDriver*"*) .option(*"url"*, jdbc_url)

spark session jdbc performance

2017-10-24 Thread Madhire, Naveen
Hi, I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues. Below is the query I am using, Using Spark 2.0.2 val df = spark_session.read.format("jdbc") .option("driver","oracle.jdbc.OracleDriver") .option("url", jdbc_url) .option("user", user)

spark session jdbc performance

2017-10-24 Thread Madhire, Naveen
Hi, I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues. Below is the query I am using, Using Spark 2.0.2 val df = spark_session.read.format("jdbc") .option("driver","oracle.jdbc.OracleDriver") .option("url", jdbc_url) .option("user", user)

Re: Spark streaming for CEP

2017-10-24 Thread Mich Talebzadeh
thanks Thomas. do you have a summary write-up for this tool please? regards, Thomas Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Orc predicate pushdown with Spark Sql

2017-10-24 Thread Siva Gudavalli
Hello, I have an update here.  spark SQL is pushing predicates down, if I load the orc files in spark Context and Is not the same when I try to read hive Table directly.please let me know if i am missing something here. Is this supported in spark  ?  when I load the files in spark Context 

Re: Spark streaming for CEP

2017-10-24 Thread Thomas Bailet
Hi we (@ hurence) have released on open source middleware based on SparkStreaming over Kafka to do CEP and log mining, called *logisland* (https://github.com/Hurence/logisland/) it has been deployed into production for 2 years now and does a great job. You should have a look. bye Thomas

Databricks Certification Registration

2017-10-24 Thread sanat kumar Patnaik
Hello All, Can anybody here please provide me a link to register for Databricks Spark developer certification(US based). I have been googling but always end up with this page at end:

Re: Zero Coefficient in logistic regression

2017-10-24 Thread Alexis Peña
Thanks,  8/10 coeff are zero estimate in CRUZADAS, the parameters for alpha and lambda are set in default(i think  zero, the model in R and SAS was fitted using glm binary logistic. Cheers De: Simon Dirmeier Fecha: martes, 24 de octubre de 2017, 08:30 Para: Alexis

Re: Zero Coefficient in logistic regression

2017-10-24 Thread Simon Dirmeier
So, all the coefficients are the same but  for CRUZADAS? How are you fitting the model in R (glm)?  Can you try setting zero penalty for alpha and lambda: .setRegParam(0) .setElasticNetParam(0) Cheers, S Am 24.10.17 um 13:19 schrieb Alexis Peña: Thanks for your Answer, the features

Re: Zero Coefficient in logistic regression

2017-10-24 Thread Alexis Peña
Thanks for your Answer, the features “Cruzadas” are Binaries (0/1). The chisq statistic must be work whit 2x2 tables. i fit the model in SAS and R and both the coeff have estimates (not significant). Two of this kind of features has estimations CRUZADAS49070,247624087

Re: Zero Coefficient in logistic regression

2017-10-24 Thread Weichen Xu
Yes chi-squared statistic only used in categorical features. It looks not proper here. Thanks! On Tue, Oct 24, 2017 at 5:13 PM, Simon Dirmeier wrote: > Hey, > as far as I know feature selection using the a chi-squared statistic, can > only be done on categorical features

Re: Zero Coefficient in logistic regression

2017-10-24 Thread Simon Dirmeier
Hey, as far as I know feature selection using the a chi-squared statistic, can only be done on categorical features and not on possibly continuous ones? Furthermore, since your logistic model doesn't use any regularization, you should be fine here. So I'd check the ChiSqSeletor and possibly

Fwd: Spark 1.x - End of life

2017-10-24 Thread Ismaël Mejía
Thanks for your answer Matei. I agree that a more explicit maintenance policy is needed (even for the 2.x releases). I did not immediately find anything about this in the website, so I ended up assuming the information of the wikipedia article that says that the 1.6.x line is still maintained. I

Accessing UI for spark running as kubernetics container on standby name node

2017-10-24 Thread Mohit Gupta
Hi, We are launching all spark jobs as kubernetics(k8es) containers inside a k8es cluster. We also create a service on each job and do port forwarding for the spark UI (container's 4040 is mapped to SvcPort 31123). The same set of nodes is also hosting a Yarn cluster. Inside container, we do