[dev-platform] Intent to prototype: Symbols as WeakMap keys (ECMA 262)

2023-09-27 Thread Yoshi Cheng-Hao Huang
*Summary:* This proposal allows Symbols can be used as keys in WeakMap/WeakSet, and as the target in WeakRef/FinalizationRegistry *Bug:* https://bugzilla.mozilla.org/show_bug.cgi?id=1828144 *Standards body:* https://tc39.es/proposal-symbols-as-weakmap-keys/ This proposal has been merged into

[dev-platform] Intent to ship: Well-Formed Unicode Strings

2023-09-06 Thread Yoshi Cheng-Hao Huang
As of Firefox 119, I intend to turn on Well-Formed Unicode Strings (JS) by default Bug to turn on by default: https://bugzilla.mozilla.org/show_bug.cgi?id=1850755 Standard: https://tc39.es/ecma262/#sec-string.prototype.iswellformed https://tc39.es/ecma262/#sec-string.prototype.towellformed The

[dev-platform] Intent to ship: Module Workers

2023-05-02 Thread Yoshi Cheng-Hao Huang
[I am sending this on behalf of Yulia Startsev, who is taking parental leave now] Summary: Module workers enables you to use ECMAScript modules on workers rather than just classic scripts. This enables the `import` `export` style syntax to be run in a worker. It also enables dynamic import to

[dev-platform] Intent to prototype: Well-Formed Unicode Strings

2023-03-28 Thread Yoshi Cheng-Hao Huang
*Summary:* Add String.prototype.isWellFormed and String.prototype.toWellFormed methods. isWellFormed() returns a boolean to indicate whether the string is a well-formed UTF-16. toWellFormed() will return a string with the all

[dev-platform] Intent to Ship: Import maps

2022-10-17 Thread Yoshi Cheng-Hao Huang
As of Firefox 108, I intend to turn Import-maps on by default. Bug to turn on by default: https://bugzilla.mozilla.org/show_bug.cgi?id=1795647 Standard: https://html.spec.whatwg.org/multipage/webappapis.html#import-maps The feature was previously discussed in this "Intent to prototype: Import

[dev-platform] Intent to prototype: Import maps

2022-04-04 Thread Yoshi Cheng-Hao Huang
Summary: Import maps allow control over what URLs get fetched by JavaScript import statements and import() expressions. Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1688879 Standards body: https://wicg.github.io/import-maps/ Link to standards positions:

RE: Welcoming Saisai (Jerry) Shao as a committer

2017-08-28 Thread Cheng, Hao
Congratulations!! Jerry, you really deserve it. Hao -Original Message- From: Mridul Muralidharan [mailto:mri...@gmail.com] Sent: Tuesday, August 29, 2017 12:04 PM To: Matei Zaharia Cc: dev ; Saisai Shao Subject:

RE: [VOTE] HADOOP-12756 - Aliyun OSS Support branch merge

2016-09-28 Thread Cheng, Hao
this to trunk would serve as a very good beginning for the following optimizations aligning with the related efforts together. The new hadoop-aliyun modlue is made possible owing to many people. Thanks to the contributors Mingfei Shi, Genmao Yu and Ling Zhou; thanks to Cheng Hao, Steve Loughran

[jira] [Commented] (SPARK-17299) TRIM/LTRIM/RTRIM strips characters other than spaces

2016-08-31 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451914#comment-15451914 ] Cheng Hao commented on SPARK-17299: --- Or come after SPARK-14878 ? > TRIM/LTRIM/RTRIM strips charact

[jira] [Commented] (SPARK-17299) TRIM/LTRIM/RTRIM strips characters other than spaces

2016-08-31 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451810#comment-15451810 ] Cheng Hao commented on SPARK-17299: --- Yes, that's my bad, I thought it should be the same behavior

RE: [VOTE] Release of Apache Mnemonic-0.2.0-incubating [rc3]

2016-07-24 Thread Cheng, Hao
+1 Thanks, Hao -Original Message- From: Gary [mailto:ga...@apache.org] Sent: Monday, July 25, 2016 10:29 AM To: dev@mnemonic.incubator.apache.org Subject: Re: [VOTE] Release of Apache Mnemonic-0.2.0-incubating [rc3] +1 Thanks. +Gary. On 7/22/2016 3:08 PM, Gary wrote: > Hi all, > >

RE: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-14 Thread Cheng, Hao
-1 Breaks the existing applications while using the Script Transformation in Spark SQL, as the default Record/Column delimiter class changed since we don’t get the default conf value from HiveConf any more, see SPARK-16515; This is a regression. From: Reynold Xin [mailto:r...@databricks.com]

[jira] [Created] (SPARK-15859) Optimize the Partition Pruning with Disjunction

2016-06-09 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-15859: - Summary: Optimize the Partition Pruning with Disjunction Key: SPARK-15859 URL: https://issues.apache.org/jira/browse/SPARK-15859 Project: Spark Issue Type

[jira] [Commented] (SPARK-15730) [Spark SQL] the value of 'hiveconf' parameter in Spark-sql CLI don't take effect in spark-sql session

2016-06-07 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318654#comment-15318654 ] Cheng Hao commented on SPARK-15730: --- [~jameszhouyi], can you please verify this fixing? > [Spark

RE: [VOTE] Accept CarbonData into the Apache Incubator

2016-05-25 Thread Cheng, Hao
+1 -Original Message- From: Jacques Nadeau [mailto:jacq...@apache.org] Sent: Thursday, May 26, 2016 8:26 AM To: general@incubator.apache.org Subject: Re: [VOTE] Accept CarbonData into the Apache Incubator +1 (binding) On Wed, May 25, 2016 at 4:04 PM, John D. Ament

[jira] [Commented] (SPARK-15034) Use the value of spark.sql.warehouse.dir as the warehouse location instead of using hive.metastore.warehouse.dir

2016-05-25 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300072#comment-15300072 ] Cheng Hao commented on SPARK-15034: --- [~yhuai], but it probably not respect the `hive-site.xml

[jira] [Commented] (SPARK-13894) SQLContext.range should return Dataset[Long]

2016-03-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195274#comment-15195274 ] Cheng Hao commented on SPARK-13894: --- The existing functions "SQLContext.range()" returns the

[jira] [Commented] (SPARK-13326) Dataset in spark 2.0.0-SNAPSHOT missing columns

2016-03-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195022#comment-15195022 ] Cheng Hao commented on SPARK-13326: --- Can not reproduce it anymore, can you try it again? > Data

RE: [VOTE] Accept Mnemonic into the Apache Incubator

2016-03-03 Thread Cheng, Hao
SF > > infrastructure and in turn made available under the Apache License, > > version 2.0. > > > > === External Dependencies === > > The required external dependencies are all Apache licenses or other > > compatible Licenses > > Note: The runtime dependent licenses of M

RE: Spark Streaming - graceful shutdown when stream has no more data

2016-02-24 Thread Cheng, Hao
This is very interesting, how to shutdown the streaming job gracefully once no input data for some time. A doable solution probably you can count the input data by using the Accumulator, and anther thread (in master node) will always to get the latest accumulator value, if there is no value

[jira] [Commented] (HADOOP-12756) Incorporate Aliyun OSS file system implementation

2016-02-03 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131540#comment-15131540 ] Cheng Hao commented on HADOOP-12756: Thank you so much [~ste...@apache.org], [~cnauroth

[jira] [Commented] (HADOOP-12756) Incorporate Aliyun OSS file system implementation

2016-02-01 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127645#comment-15127645 ] Cheng Hao commented on HADOOP-12756: +1 This is critical for AliYun users when integrated

RE: Spark SQL joins taking too long

2016-01-27 Thread Cheng, Hao
Another possibility is about the parallelism? Probably be 1 or some other small value, since the input data size is not that big. If in that case, probably you can try something like: Df1.repartition(10).registerTempTable(“hospitals”); Df2.repartition(10).registerTempTable(“counties”); … And

RE: JSON to SQL

2016-01-27 Thread Cheng, Hao
Have you ever try the DataFrame API like: sqlContext.read.json("/path/to/file.json"); the Spark SQL will auto infer the type/schema for you. And lateral view will help on the flatten issues, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView, as well as the “a.b[0].c”

RE: HiBench as part of Bigtop

2016-01-26 Thread Cheng, Hao
been working on BigPetStore, a family of example applications (blueprints). We've also been working on data generators to create relatively complex fake data for those demo apps. On Tue, Jan 26, 2016 at 9:10 PM, Cheng, Hao <hao.ch...@intel.com> wrote: > Hi RJ, > > Currently, w

RE: HiBench as part of Bigtop

2016-01-26 Thread Cheng, Hao
maintainers list. This way we know who to contact if a build fails and blocks a release. If there is no maintainer (and the build has problems), we use that as grounds for removing the package from Bigtop. Thanks! RJ On Tue, Jan 26, 2016 at 1:15 AM, Cheng, Hao <hao.ch...@intel.com>

RE: HiBench as part of Bigtop

2016-01-25 Thread Cheng, Hao
On Tue, Jan 26, 2016 at 05:37AM, Cheng, Hao wrote: > Dear BigTop Devs, > > I am from Intel Big Data Technology team, and we are the owner of > HiBench, an open source benchmark suite for Hadoop / Spark ecosystem; > as widely used, more and more trivial requirements from HiBenc

HiBench as part of Bigtop

2016-01-25 Thread Cheng, Hao
Dear BigTop Devs, I am from Intel Big Data Technology team, and we are the owner of HiBench, an open source benchmark suite for Hadoop / Spark ecosystem; as widely used, more and more trivial requirements from HiBench users, due to the limited resources, particularly the ease of deployment, we

[jira] [Updated] (SPARK-12610) Add Anti Join Operators

2016-01-03 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-12610: -- Issue Type: Sub-task (was: New Feature) Parent: SPARK-4226 > Add Anti Join Operat

[jira] [Created] (SPARK-12610) Add Anti Join Operators

2016-01-03 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-12610: - Summary: Add Anti Join Operators Key: SPARK-12610 URL: https://issues.apache.org/jira/browse/SPARK-12610 Project: Spark Issue Type: New Feature

RE: Problem with WINDOW functions?

2015-12-29 Thread Cheng, Hao
Which version are you using? Have you tried the 1.6? From: Vadim Tkachenko [mailto:apache...@gmail.com] Sent: Wednesday, December 30, 2015 10:17 AM To: Cheng, Hao Cc: user@spark.apache.org Subject: Re: Problem with WINDOW functions? When I allocate 200g to executor, it is able to make better

RE: Problem with WINDOW functions?

2015-12-29 Thread Cheng, Hao
Can you try to write the result into another file instead? Let's see if there any issue in the executors side . sqlContext.sql("SELECT day,page,dense_rank() OVER (PARTITION BY day ORDER BY pageviews DESC) as rank FROM d1").filter("rank <= 20").sort($"day",$"rank").write.parquet("/path/to/file")

RE: Problem with WINDOW functions?

2015-12-29 Thread Cheng, Hao
Is there any improvement if you set a bigger memory for executors? -Original Message- From: va...@percona.com [mailto:va...@percona.com] On Behalf Of Vadim Tkachenko Sent: Wednesday, December 30, 2015 9:51 AM To: Cheng, Hao Cc: user@spark.apache.org Subject: Re: Problem with WINDOW

RE: Problem with WINDOW functions?

2015-12-29 Thread Cheng, Hao
s etc. will be more helpful in understanding your problem. From: Vadim Tkachenko [mailto:apache...@gmail.com] Sent: Wednesday, December 30, 2015 10:49 AM To: Cheng, Hao Subject: Re: Problem with WINDOW functions? I use 1.5.2. Where can I get 1.6? I do not see it on http://spark.apache.org/downloads.html T

RE: Does Spark SQL support rollup like HQL

2015-12-29 Thread Cheng, Hao
Hi, currently, the Simple SQL Parser of SQLContext is quite weak, and doesn’t support the rollup, but you can check the code https://github.com/apache/spark/pull/5080/ , which aimed to add the support, just in case you can patch it in your own branch. In Spark 2.0, the simple SQL Parser will

[jira] [Commented] (SPARK-12196) Store blocks in different speed storage devices by hierarchy way

2015-12-28 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072634#comment-15072634 ] Cheng Hao commented on SPARK-12196: --- Thank you wei wu to support this feature! However, we're trying

[jira] [Updated] (SPARK-8360) Streaming DataFrames

2015-12-02 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-8360: - Attachment: StreamingDataFrameProposal.pdf This is a proposal for streaming dataframes that we were

[jira] [Comment Edited] (SPARK-8360) Streaming DataFrames

2015-12-02 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035335#comment-15035335 ] Cheng Hao edited comment on SPARK-8360 at 12/2/15 12:14 PM: Remove the google

[jira] [Comment Edited] (SPARK-8360) Streaming DataFrames

2015-12-01 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035335#comment-15035335 ] Cheng Hao edited comment on SPARK-8360 at 12/2/15 6:19 AM: --- Add some thoughts

[jira] [Commented] (SPARK-8360) Streaming DataFrames

2015-12-01 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035335#comment-15035335 ] Cheng Hao commented on SPARK-8360: -- Add some thoughts on StreamingSQL. https://docs.google.com/document

[jira] [Created] (SPARK-12064) Make the SqlParser as trait for better integrated with extensions

2015-11-30 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-12064: - Summary: Make the SqlParser as trait for better integrated with extensions Key: SPARK-12064 URL: https://issues.apache.org/jira/browse/SPARK-12064 Project: Spark

[jira] [Resolved] (SPARK-12064) Make the SqlParser as trait for better integrated with extensions

2015-11-30 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao resolved SPARK-12064. --- Resolution: Won't Fix DBX has plan to remove the SqlParser in 2.0. > Make the SqlParser as tr

RE: new datasource

2015-11-19 Thread Cheng, Hao
I think you probably need to write some code as you need to support the ES, there are 2 options per my understanding: Create a new Data Source from scratch, but you probably need to overwrite the interface at:

RE: A proposal for Spark 2.0

2015-11-12 Thread Cheng, Hao
I am not sure what the best practice for this specific problem, but it’s really worth to think about it in 2.0, as it is a painful issue for lots of users. By the way, is it also an opportunity to deprecate the RDD API (or internal API only?)? As lots of its functionality overlapping with

RE: A proposal for Spark 2.0

2015-11-12 Thread Cheng, Hao
DataFrames or DataSets don't fully fit. On Thu, Nov 12, 2015 at 5:17 PM, Cheng, Hao <hao.ch...@intel.com<mailto:hao.ch...@intel.com>> wrote: I am not sure what the best practice for this specific problem, but it’s really worth to think about it in 2.0, as it is a painful is

[jira] [Commented] (SPARK-10865) [Spark SQL] [UDF] the ceil/ceiling function got wrong return value type

2015-11-11 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001423#comment-15001423 ] Cheng Hao commented on SPARK-10865: --- 1.5.2 is released, I am not sure whether part of it now

[jira] [Commented] (SPARK-10865) [Spark SQL] [UDF] the ceil/ceiling function got wrong return value type

2015-11-11 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001422#comment-15001422 ] Cheng Hao commented on SPARK-10865: --- We actually follow the criteria of Hive, and actually I tested

RE: Sort Merge Join from the filesystem

2015-11-09 Thread Cheng, Hao
Yes, we definitely need to think how to handle this case, probably even more common than both sorted/partitioned tables case, can you jump to the jira and leave comment there? From: Alex Nastetsky [mailto:alex.nastet...@vervemobile.com] Sent: Tuesday, November 10, 2015 3:03 AM To: Cheng, Hao Cc

RE: dataframe slow down with tungsten turn on

2015-11-05 Thread Cheng, Hao
turn on -- Forwarded message -- From: gen tang <gen.tan...@gmail.com<mailto:gen.tan...@gmail.com>> Date: Fri, Nov 6, 2015 at 12:14 AM Subject: Re: dataframe slow down with tungsten turn on To: "Cheng, Hao" <hao.ch...@intel.com<mailto:hao.ch...@inte

RE: Rule Engine for Spark

2015-11-04 Thread Cheng, Hao
Or try Streaming SQL? Which is a simple layer on top of the Spark Streaming. ☺ https://github.com/Intel-bigdata/spark-streamingsql From: Cassa L [mailto:lcas...@gmail.com] Sent: Thursday, November 5, 2015 8:09 AM To: Adrian Tanase Cc: Stefano Baghino; user Subject: Re: Rule Engine for Spark

RE: Sort Merge Join from the filesystem

2015-11-04 Thread Cheng, Hao
Yes, we probably need more change for the data source API if we need to implement it in a generic way. BTW, I create the JIRA by copy most of words from Alex. ☺ https://issues.apache.org/jira/browse/SPARK-11512 From: Reynold Xin [mailto:r...@databricks.com] Sent: Thursday, November 5, 2015

[jira] [Created] (SPARK-11512) Bucket Join

2015-11-04 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-11512: - Summary: Bucket Join Key: SPARK-11512 URL: https://issues.apache.org/jira/browse/SPARK-11512 Project: Spark Issue Type: Sub-task Components: SQL

[jira] [Commented] (SPARK-11512) Bucket Join

2015-11-04 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990867#comment-14990867 ] Cheng Hao commented on SPARK-11512: --- Oh, yes, but SPARK-5292 is only about to support the Hive bucket

[jira] [Commented] (SPARK-11512) Bucket Join

2015-11-04 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990868#comment-14990868 ] Cheng Hao commented on SPARK-11512: --- We need to support the "bucket" for DataSource API. >

RE: dataframe slow down with tungsten turn on

2015-11-04 Thread Cheng, Hao
BTW, 1 min V.S. 2 Hours, seems quite weird, can you provide more information on the ETL work? From: Cheng, Hao [mailto:hao.ch...@intel.com] Sent: Thursday, November 5, 2015 12:56 PM To: gen tang; dev@spark.apache.org Subject: RE: dataframe slow down with tungsten turn on 1.5 has critical

RE: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

2015-11-04 Thread Cheng, Hao
Probably 2 reasons: 1. HadoopFsRelation was introduced since 1.4, but seems CsvRelation was created based on 1.3 2. HadoopFsRelation introduces the concept of Partition, which probably not necessary for LibSVMRelation. But I think it will be easy to change as extending from

RE: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

2015-11-04 Thread Cheng, Hao
problem as you described, probably we can add additional checking / reporting rule for the abuse. From: Jeff Zhang [mailto:zjf...@gmail.com] Sent: Thursday, November 5, 2015 1:55 PM To: Cheng, Hao Cc: dev@spark.apache.org Subject: Re: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation

RE: Sort Merge Join

2015-11-02 Thread Cheng, Hao
No as far as I can tell, @Michael @YinHuai @Reynold , any comments on this optimization? From: Jonathan Coveney [mailto:jcove...@gmail.com] Sent: Tuesday, November 3, 2015 4:17 AM To: Alex Nastetsky Cc: Cheng, Hao; user Subject: Re: Sort Merge Join Additionally, I'm curious if there are any

RE: Sort Merge Join

2015-11-01 Thread Cheng, Hao
1) Once SortMergeJoin is enabled, will it ever use ShuffledHashJoin? For example, in the code below, the two datasets have different number of partitions, but it still does a SortMerge join after a "hashpartitioning". [Hao:] A distributed JOIN operation (either HashBased or SortBased Join)

[jira] [Commented] (SPARK-10371) Optimize sequential projections

2015-10-29 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981650#comment-14981650 ] Cheng Hao commented on SPARK-10371: --- Eliminating the common sub expression within the projection

RE: [Spark-SQL]: Unable to propagate hadoop configuration after SparkContext is initialized

2015-10-28 Thread Cheng, Hao
Hi Jerry, I’ve filed a bug in jira, and also the fixing https://issues.apache.org/jira/browse/SPARK-11364 It will be great appreciated if you can verify the PR with your case. Thanks, Hao From: Cheng, Hao [mailto:hao.ch...@intel.com] Sent: Wednesday, October 28, 2015 8:51 AM To: Jerry Lam

[jira] [Commented] (SPARK-11330) Filter operation on StringType after groupBy PERSISTED brings no results

2015-10-28 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979646#comment-14979646 ] Cheng Hao commented on SPARK-11330: --- [~saif.a.ellafi] I've checked that with 1.5.0 and it's confirmed

[jira] [Comment Edited] (SPARK-11330) Filter operation on StringType after groupBy PERSISTED brings no results

2015-10-28 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979699#comment-14979699 ] Cheng Hao edited comment on SPARK-11330 at 10/29/15 2:48 AM: - OK, seems it's

[jira] [Commented] (SPARK-11330) Filter operation on StringType after groupBy PERSISTED brings no results

2015-10-28 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979699#comment-14979699 ] Cheng Hao commented on SPARK-11330: --- OK, seems it's solved in https://issues.apache.org/jira/browse

RE: SparkSQL on hive error

2015-10-27 Thread Cheng, Hao
Hi Anand, can you paste the table creating statement? I’d like to reproduce that in my local first, and BTW, which version are you using? Hao From: Anand Nalya [mailto:anand.na...@gmail.com] Sent: Tuesday, October 27, 2015 11:35 PM To: spark users Subject: SparkSQL on hive error Hi, I've a

RE: [Spark-SQL]: Unable to propagate hadoop configuration after SparkContext is initialized

2015-10-27 Thread Cheng, Hao
After a draft glance, seems a bug in Spark SQL, do you mind to create a jira for this? And then I can start to fix it. Thanks, Hao From: Jerry Lam [mailto:chiling...@gmail.com] Sent: Wednesday, October 28, 2015 3:13 AM To: Marcelo Vanzin Cc: user@spark.apache.org Subject: Re: [Spark-SQL]:

[jira] [Comment Edited] (SPARK-11330) Filter operation on StringType after groupBy PERSISTED brings no results

2015-10-27 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977600#comment-14977600 ] Cheng Hao edited comment on SPARK-11330 at 10/28/15 2:28 AM: - Hi

[jira] [Commented] (SPARK-11330) Filter operation on StringType after groupBy PERSISTED brings no results

2015-10-27 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977600#comment-14977600 ] Cheng Hao commented on SPARK-11330: --- Hi, [~saif.a.ellafi], I've tried the code like below: {code} case

RE: HiveContext ignores ("skip.header.line.count"="1")

2015-10-26 Thread Cheng, Hao
I am not sure if we really want to support that with HiveContext, but a workround is to use the Spark package at https://github.com/databricks/spark-csv From: Felix Cheung [mailto:felixcheun...@hotmail.com] Sent: Tuesday, October 27, 2015 10:54 AM To: Daniel Haviv; user Subject: RE: HiveContext

[jira] [Updated] (SPARK-9735) Auto infer partition schema of HadoopFsRelation should should respected the user specified one

2015-10-20 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-9735: - Description: This code is copied from the hadoopFsRelationSuite.scala {code} partitionedTestDF

[jira] [Commented] (SPARK-4226) SparkSQL - Add support for subqueries in predicates

2015-10-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958524#comment-14958524 ] Cheng Hao commented on SPARK-4226: -- [~nadenf] Actually I am working on it right now, and the first PR

[jira] [Created] (SPARK-11076) Decimal Support for Ceil/Floor

2015-10-12 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-11076: - Summary: Decimal Support for Ceil/Floor Key: SPARK-11076 URL: https://issues.apache.org/jira/browse/SPARK-11076 Project: Spark Issue Type: Improvement

RE: Hive with apache spark

2015-10-11 Thread Cheng, Hao
One option is you can read the data via JDBC, however, probably it's the worst option, as you probably need some hacky work to enable the parallel reading in Spark SQL. Another option is copy the hive-site.xml of your Hive Server to $SPARK_HOME/conf, then Spark SQL will see everything that Hive

RE: Saprk 1.5 - How to join 3 RDDs in a SQL DF?

2015-10-11 Thread Cheng, Hao
A join B join C === (A join B) join C Semantically they are equivalent, right? From: Richard Eggert [mailto:richard.egg...@gmail.com] Sent: Monday, October 12, 2015 5:12 AM To: Subhajit Purkayastha Cc: User Subject: Re: Saprk 1.5 - How to join 3 RDDs in a SQL DF? It's the same as joining 2.

RE: Join Order Optimization

2015-10-11 Thread Cheng, Hao
Spark SQL supports very basic join reordering optimization, based on the raw table data size, this was added couple major releases back. And the “EXPLAIN EXTENDED query” command is a very informative tool to verify whether the optimization taking effect. From: Raajay

RE: Saprk 1.5 - How to join 3 RDDs in a SQL DF?

2015-10-11 Thread Cheng, Hao
hih...@gmail.com] Sent: Monday, October 12, 2015 8:37 AM To: Cheng, Hao Cc: Richard Eggert; Subhajit Purkayastha; User Subject: Re: Saprk 1.5 - How to join 3 RDDs in a SQL DF? Some weekend reading: http://stackoverflow.com/questions/20022196/are-left-outer-joins-associative Cheers On Sun, Oct 11, 2015 a

RE: Join Order Optimization

2015-10-11 Thread Cheng, Hao
Probably you have to read the source code, I am not sure if there are any .ppt or slides. Hao From: VJ Anand [mailto:vjan...@sankia.com] Sent: Monday, October 12, 2015 11:43 AM To: Cheng, Hao Cc: Raajay; user@spark.apache.org Subject: Re: Join Order Optimization Hi - Is there a design document

RE: Join Order Optimization

2015-10-11 Thread Cheng, Hao
, October 12, 2015 10:17 AM To: Cheng, Hao Cc: user@spark.apache.org Subject: Re: Join Order Optimization Hi Cheng, Could you point me to the JIRA that introduced this change ? Also, is this SPARK-2211 the right issue to follow for cost-based optimization? Thanks Raajay On Sun, Oct 11, 2015 at 7

RE: Insert via HiveContext is slow

2015-10-09 Thread Cheng, Hao
I think DF performs the same as the SQL API does in the multi-inserts, if you don’t use the cached table. Hao From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] Sent: Friday, October 9, 2015 3:09 PM To: Cheng, Hao Cc: user Subject: Re: Insert via HiveContext is slow Thanks Hao

[jira] [Closed] (SPARK-11041) Add (NOT) IN / EXISTS support for predicates

2015-10-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao closed SPARK-11041. - Resolution: Duplicate > Add (NOT) IN / EXISTS support for predica

[jira] [Created] (SPARK-11041) Add (NOT) IN / EXISTS support for predicates

2015-10-09 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-11041: - Summary: Add (NOT) IN / EXISTS support for predicates Key: SPARK-11041 URL: https://issues.apache.org/jira/browse/SPARK-11041 Project: Spark Issue Type

RE: Insert via HiveContext is slow

2015-10-08 Thread Cheng, Hao
I think that’s a known performance issue(Compared to Hive) of Spark SQL in multi-inserts. A workaround is create a temp cached table for the projection first, and then do the multiple inserts base on the cached table. We are actually working on the POC of some similar cases, hopefully it comes

[jira] [Updated] (SPARK-10992) Partial Aggregation Support for Hive UDAF

2015-10-07 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-10992: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-4366 > Partial Aggregation Supp

[jira] [Created] (SPARK-10831) Spark SQL Configuration missing in the doc

2015-09-25 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-10831: - Summary: Spark SQL Configuration missing in the doc Key: SPARK-10831 URL: https://issues.apache.org/jira/browse/SPARK-10831 Project: Spark Issue Type

[jira] [Created] (SPARK-10829) Scan DataSource with predicate expression combine partition key and attributes doesn't work

2015-09-24 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-10829: - Summary: Scan DataSource with predicate expression combine partition key and attributes doesn't work Key: SPARK-10829 URL: https://issues.apache.org/jira/browse/SPARK-10829

[jira] [Commented] (SPARK-10733) TungstenAggregation cannot acquire page after switching to sort-based

2015-09-23 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904778#comment-14904778 ] Cheng Hao commented on SPARK-10733: --- [~jameszhouyi] Can you please patch the https://github.com

RE: Performance Spark SQL vs Dataframe API faster

2015-09-22 Thread Cheng, Hao
Yes, should be the same, as they are just different frontend, but the same thing in optimization / execution. -Original Message- From: sanderg [mailto:s.gee...@wimionline.be] Sent: Tuesday, September 22, 2015 10:06 PM To: user@spark.apache.org Subject: Performance Spark SQL vs Dataframe

[jira] [Commented] (SPARK-10474) Aggregation failed with unable to acquire memory

2015-09-17 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802912#comment-14802912 ] Cheng Hao commented on SPARK-10474: --- The root reason for this failure, is because

[jira] [Comment Edited] (SPARK-10474) Aggregation failed with unable to acquire memory

2015-09-17 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802912#comment-14802912 ] Cheng Hao edited comment on SPARK-10474 at 9/17/15 1:48 PM: The root reason

[jira] [Commented] (SPARK-10606) Cube/Rollup/GrpSet doesn't create the correct plan when group by is on something other than an AttributeReference

2015-09-16 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791499#comment-14791499 ] Cheng Hao commented on SPARK-10606: --- [~rhbutani] Which version are you using, actually I've fixed

RE: RE: spark sql hook

2015-09-16 Thread Cheng, Hao
Probably a workable solution is, create your own SQLContext by extending the class HiveContext, and override the `analyzer`, and add your own rule to do the hacking. From: r7raul1...@163.com [mailto:r7raul1...@163.com] Sent: Thursday, September 17, 2015 11:08 AM To: Cheng, Hao; user Subject: Re

RE: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-16 Thread Cheng, Hao
We actually meet the similiar problem in a real case, see https://issues.apache.org/jira/browse/SPARK-10474 After checking the source code, the external sort memory management strategy seems the root cause of the issue. Currently, we allocate the 4MB (page size) buffer as initial in the

RE: spark sql hook

2015-09-16 Thread Cheng, Hao
Catalyst TreeNode is very fundamental API, not sure what kind of hook you need. Any concrete example will be more helpful to understand your requirement. Hao From: r7raul1...@163.com [mailto:r7raul1...@163.com] Sent: Thursday, September 17, 2015 10:54 AM To: user Subject: spark sql hook I

[jira] [Commented] (SPARK-4226) SparkSQL - Add support for subqueries in predicates

2015-09-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746642#comment-14746642 ] Cheng Hao commented on SPARK-4226: -- Thank you [~brooks], you're right! I meant it will makes more

[jira] [Commented] (SPARK-10474) Aggregation failed with unable to acquire memory

2015-09-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744969#comment-14744969 ] Cheng Hao commented on SPARK-10474: --- The root causes for the exception is the executor don't have

[jira] [Commented] (SPARK-10466) UnsafeRow exception in Sort-Based Shuffle with data spill

2015-09-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744966#comment-14744966 ] Cheng Hao commented on SPARK-10466: --- [~naliazheli] It's an irrelevant issue, you'd better to subscribe

[jira] [Commented] (SPARK-10474) Aggregation failed with unable to acquire memory

2015-09-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745008#comment-14745008 ] Cheng Hao commented on SPARK-10474: --- But from the current implementation, we'd better not to throw

[jira] [Commented] (SPARK-10466) UnsafeRow exception in Sort-Based Shuffle with data spill

2015-09-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744967#comment-14744967 ] Cheng Hao commented on SPARK-10466: --- [~naliazheli] It's an irrelevant issue, you'd better to subscribe

[jira] [Issue Comment Deleted] (SPARK-10466) UnsafeRow exception in Sort-Based Shuffle with data spill

2015-09-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-10466: -- Comment: was deleted (was: [~naliazheli] It's an irrelevant issue, you'd better to subscribe

[jira] [Commented] (SPARK-4226) SparkSQL - Add support for subqueries in predicates

2015-09-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745467#comment-14745467 ] Cheng Hao commented on SPARK-4226: -- [~marmbrus] [~yhuai] After investigating a little bit, I think using

  1   2   3   4   5   6   7   >