Spark SQL reads all leaf directories on a partitioned Hive table

2019-08-07 Thread Hao Ren
quet files directly. Spark has partition-awareness for partitioned directories. But still, I would like to know if there is a way to leverage partition-awareness via Hive by using `spark.sql` API? Any help is highly appreciated! Thank you. -- Hao Ren

Re: [SPARK-2.0][SQL] UDF containing non-serializable object does not work as expected

2016-08-08 Thread Hao Ren
Yes, it is. You can define a udf like that. Basically, it's a udf Int => Int which is a closure contains a non serializable object. The latter should cause Task not serializable exception. Hao On Mon, Aug 8, 2016 at 5:08 AM, Muthu Jayakumar <bablo...@gmail.com> wrote: > H

[SPARK-2.0][SQL] UDF containing non-serializable object does not work as expected

2016-08-07 Thread Hao Ren
($"key" === 2).show() // *It does not work as expected (org.apache.spark.SparkException: Task not serializable)* } run() } Also, I tried collect(), count(), first(), limit(). All of them worked without non-serializable exceptions. It seems only filter() throws the exception

[MLlib] Term Frequency in TF-IDF seems incorrect

2016-08-01 Thread Hao Ren
? -- Hao Ren Data Engineer @ leboncoin Paris, France

SparkSQL can not extract values from UDT (like VectorUDT)

2015-10-12 Thread Hao Ren
ache/spark/sql/catalyst/expressions/complexTypeExtractors.scala#L49 It seems that the pattern matching does not take UDT into consideration. Is this an intended feature? If not, I would like to create a PR to fix it. -- Hao Ren Data Engineer @ leboncoin Paris, France

Re: [MLlib] BinaryLogisticRegressionSummary on test set

2015-09-18 Thread Hao Ren
ll/7099/files#diff-668c79317c51f40df870d3404d8a731fR272>); > perhaps you could push for this to happen by creating a Jira and pinging > jkbradley and mengxr. Thanks! > > On Thu, Sep 17, 2015 at 8:07 AM, Hao Ren <inv...@gmail.com> wrote: > >> Working on spark.ml.classification.LogisticRegression.s

S3 Read / Write makes executors deadlocked

2015-07-16 Thread Hao Ren
(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- Hao Ren Data Engineer @ leboncoin Paris, France

Re: S3 Read / Write makes executors deadlocked

2015-07-16 Thread Hao Ren
is highly appreciated. If you need more info, checkout the jira I created: https://issues.apache.org/jira/browse/SPARK-8869 On Thu, Jul 16, 2015 at 11:39 AM, Hao Ren inv...@gmail.com wrote: Given the following code which just reads from s3, then saves files to s3 val