Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-30 Thread Ye Xianjin
+1 Sent from my iPhoneOn Apr 30, 2024, at 3:23 PM, DB Tsai wrote:+1 On Apr 29, 2024, at 8:01 PM, Wenchen Fan wrote:To add more color:Spark data source table and Hive Serde table are both stored in the Hive metastore and keep the data files in the table directory. The only difference is they

Re: Welcome two new Apache Spark committers

2023-08-06 Thread Ye Xianjin
Congratulations!Sent from my iPhoneOn Aug 7, 2023, at 11:16 AM, Yuming Wang wrote:Congratulations!On Mon, Aug 7, 2023 at 11:11 AM Kent Yao wrote:Congrats! Peter and Xiduo! Cheng Pan 于2023年8月7日周一 11:01写道: > > Congratulations! Peter and Xiduo! > > Thanks, >

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-14 Thread Ye Xianjin
The configuration of ‘…file.upload.path’ is wrong. it means a distributed fs path to store your archives/resource/jars temporarily, then distributed by spark to drivers/executors. For your cases, you don’t need to set this configuration.Sent from my iPhoneOn Feb 14, 2023, at 5:43 AM, karan alang

Re: [VOTE] Accept Uniffle into the Apache Incubator

2022-05-30 Thread Ye Xianjin
+1 (no-binding) Sent from my iPhone > On May 31, 2022, at 10:46 AM, Aloys Zhang wrote: > > +1 (no-binding) > > XiaoYu 于2022年5月31日周二 10:12写道: > >> +1 (no-binding) >> >> Xun Liu 于2022年5月31日周二 10:07写道: >>> >>> +1 (binding) for me. >>> >>> Good luck! >>> >>> On Tue, May 31, 2022 at 10:04

Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-24 Thread Ye Xianjin
+1 (non-binding). Sent from my iPhone > On May 25, 2022, at 9:59 AM, Goson zhang wrote: > > +1 (non-binding) > > Good luck! > > Daniel Widdis 于2022年5月25日周三 09:53写道: > >> +1 (non-binding) from me! Good luck! >> >> On 5/24/22, 9:05 AM, "Jerry Shao" wrote: >> >>Hi all, >> >>Due

Re: Random expr in join key not support

2021-10-19 Thread Ye Xianjin
> For that, you can add a table subquery and do it in the select list. Do you mean something like this: select * from t1 join (select floor(random()*9) + id as x from t2) m on t1.id = m.x ? Yes, that works. But that raise another question: theses two queries seem semantically equivalent, yet

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-15 Thread Ye Xianjin
Hi, Thanks for Ryan and Wenchen for leading this. I’d like to add my two cents here. In production environments, the function catalog might be used by multiple systems, such as Spark, Presto and Hive. Is it possible that this function catalog is designed with as an unified function catalog

Re: Welcoming some new committers and PMC members

2019-09-09 Thread Ye Xianjin
Congratulations! Sent from my iPhone > On Sep 10, 2019, at 9:19 AM, Jeff Zhang wrote: > > Congratulations! > > Saisai Shao 于2019年9月10日周二 上午9:16写道: >> Congratulations! >> >> Jungtaek Lim 于2019年9月9日周一 下午6:11写道: >>> Congratulations! Well deserved! >>> On Tue, Sep 10, 2019 at 9:51 AM

Re: Why does a 3.8 T dataset take up 11.59 Tb on HDFS

2015-11-24 Thread Ye Xianjin
Hi AlexG: Files(blocks more specifically) has 3 copies on HDFS by default. So 3.8 * 3 = 11.4TB. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, November 25, 2015 at 2:31 PM, AlexG wrote: > I downloaded a 3.8 T dataset from S3 to a freshly launched sp

Re: An interesting and serious problem I encountered

2015-02-13 Thread Ye Xianjin
. This is my calculation based on the spark SizeEstimator. However I am not sure what an Integer will occupy for 64 bits JVM with compressedOps on. It should be 12 + 4 = 16 bytes, then that means the SizeEstimator gives the wrong result. @Sean what do you think? -- Ye Xianjin Sent with Sparrow

Re: Welcoming three new committers

2015-02-03 Thread Ye Xianjin
Congratulations! -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, February 4, 2015 at 6:34 AM, Matei Zaharia wrote: Hi all, The PMC recently voted to add three new committers: Cheng Lian, Joseph Bradley and Sean Owen. All three have been major

[jira] [Commented] (SPARK-4631) Add real unit test for MQTT

2015-01-29 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296811#comment-14296811 ] Ye Xianjin commented on SPARK-4631: --- [~dragos], Thread.sleep(50) do pass the test on my

[jira] [Issue Comment Deleted] (SPARK-4631) Add real unit test for MQTT

2015-01-29 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Xianjin updated SPARK-4631: -- Comment: was deleted (was: [~dragos], Thread.sleep(50) do pass the test on my machine. ) Add real

[jira] [Commented] (SPARK-4631) Add real unit test for MQTT

2015-01-29 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296812#comment-14296812 ] Ye Xianjin commented on SPARK-4631: --- [~dragos], Thread.sleep(50) do pass the test on my

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Ye Xianjin
Sean, the MQRRStreamSuite is also failed for me on Mac OS X, Though I don’t have time to invest that. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, January 28, 2015 at 9:17 PM, Sean Owen wrote: +1 (nonbinding). I verified that all the hash / signing

[jira] [Commented] (SPARK-4631) Add real unit test for MQTT

2015-01-28 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296232#comment-14296232 ] Ye Xianjin commented on SPARK-4631: --- Hi [~dragos], I have the same issue here. I'd like

Re: Can't run Spark java code from command line

2015-01-13 Thread Ye Xianjin
There is no binding issue here. Spark picks the right ip 10.211.55.3 for you. The printed message is just an indication. However I have no idea why spark-shell hangs or stops. 发自我的 iPhone 在 2015年1月14日,上午5:10,Akhil Das ak...@sigmoidanalytics.com 写道: It just a binding issue with the

[jira] [Created] (SPARK-5201) ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing with inclusive range

2015-01-11 Thread Ye Xianjin (JIRA)
Ye Xianjin created SPARK-5201: - Summary: ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing with inclusive range Key: SPARK-5201 URL: https://issues.apache.org/jira/browse/SPARK-5201

[jira] [Commented] (SPARK-5201) ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing with inclusive range

2015-01-11 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273277#comment-14273277 ] Ye Xianjin commented on SPARK-5201: --- I will send a pr

Re: Is it safe to use Scala 2.11 for Spark build?

2014-11-17 Thread Ye Xianjin
: unresolved dependency: org.apache.kafka#kafka_2.11;0.8.0: not found [error] (catalyst/*:update) sbt.ResolveException: unresolved dependency: org.scalamacros#quasiquotes_2.11;2.0.1: not found -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, November 18, 2014

[jira] [Commented] (FLUME-2385) Flume spans log file with Spooling Directory Source runner has shutdown messages at INFO level

2014-11-10 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/FLUME-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205891#comment-14205891 ] Ye Xianjin commented on FLUME-2385: --- hi, [~scaph01], I think (according to my colleague

[jira] [Commented] (SPARK-4002) JavaKafkaStreamSuite.testKafkaStream fails on OSX

2014-10-22 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179753#comment-14179753 ] Ye Xianjin commented on SPARK-4002: --- Hi, [~rdub] what's your mac os x's hostname ? Mine

Re: spark_classpath in core/pom.xml and yarn/porm.xml

2014-09-25 Thread Ye Xianjin
, the SparkConf will throw SparkException. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, September 23, 2014 at 12:56 AM, Ye Xianjin wrote: Hi: I notice the scalatest-maven-plugin set SPARK_CLASSPATH environment variable for testing. But in the SparkConf.scala

Re: spark_classpath in core/pom.xml and yarn/pom.xml

2014-09-25 Thread Ye Xianjin
not be used any more. But I don't think it's worthy to file a JIRA for such small change. Maybe put it into other related JIRA. It's a pity that your pr already got merged. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, September 26, 2014 at 6:29 AM, Sandy Ryza

spark_classpath in core/pom.xml and yarn/porm.xml

2014-09-22 Thread Ye Xianjin
Hi: I notice the scalatest-maven-plugin set SPARK_CLASSPATH environment variable for testing. But in the SparkConf.scala, this is deprecated in Spark 1.0+. So what this variable for? should we just remove this variable? -- Ye Xianjin Sent with Sparrow (http

java_home detection bug in maven 3.2.3

2014-09-18 Thread Ye Xianjin
` fi I wanted to file a jira to http://jira.codehaus.org (http://jira.codehaus.org/). But it seems it's not open for registration. So I think maybe it's a good idea to send an email here. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

Re: groupBy gives non deterministic results

2014-09-10 Thread Ye Xianjin
Great. And you should ask question in user@spark.apache.org mail list. I believe many people don't subscribe the incubator mail list now. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, September 10, 2014 at 6:03 PM, redocpot wrote: Hi, I am using

Re: groupBy gives non deterministic results

2014-09-10 Thread Ye Xianjin
| Do the two mailing lists share messages ? I don't think so. I didn't receive this message from the user list. I am not in databricks, so I can't answer your other questions. Maybe Davies Liu dav...@databricks.com can answer you? -- Ye Xianjin Sent with Sparrow (http

Re: groupBy gives non deterministic results

2014-09-10 Thread Ye Xianjin
Well, That's weird. I don't see this thread in my mail box as sending to user list. Maybe because I also subscribe the incubator mail list? I do see mails sending to incubator mail list and no one replies. I thought it was because people don't subscribe the incubator now. -- Ye Xianjin Sent

Re: groupBy gives non deterministic results

2014-09-09 Thread Ye Xianjin
Can you provide small sample or test data that reproduce this problem? and what's your env setup? single node or cluster? Sent from my iPhone On 2014年9月8日, at 22:29, redocpot julien19890...@gmail.com wrote: Hi, I have a key-value RDD called rdd below. After a groupBy, I tried to count

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Ye Xianjin
what did you see in the log? was there anything related to mapreduce? can you log into your hdfs (data) node, use jps to list all java process and confirm whether there is a tasktracker process (or nodemanager) running with datanode process -- Ye Xianjin Sent with Sparrow (http

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Ye Xianjin
): org.apache.hadoop.hdfs.server.datanode.DataNode On Mon, Sep 8, 2014 at 6:39 PM, Ye Xianjin advance...@gmail.com wrote: what did you see in the log? was there anything related to mapreduce? can you log into your hdfs (data) node, use jps to list all java process and confirm whether there is a tasktracker

Re: about spark assembly jar

2014-09-02 Thread Ye Xianjin
no comments. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, September 2, 2014 at 4:58 PM, Sean Owen wrote: No, usually you unit-test your changes during development. That doesn't require the assembly. Eventually you may wish to test some change against the complete

[jira] [Commented] (SPARK-3098) In some cases, operation zipWithIndex get a wrong results

2014-09-01 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117558#comment-14117558 ] Ye Xianjin commented on SPARK-3098: --- hi, [~srowen] and [~gq], I think what [~matei

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Ye Xianjin
We just used CDH 4.7 for our production cluster. And I believe we won't use CDH 5 in the next year. Sent from my iPhone On 2014年8月29日, at 14:39, Matei Zaharia matei.zaha...@gmail.com wrote: Personally I'd actually consider putting CDH4 back if there are still users on it. It's always

Re: Too many open files

2014-08-29 Thread Ye Xianjin
need to change this limit on all the cluster nodes or just the master? Thanks On Aug 29, 2014 11:43 AM, Ye Xianjin advance...@gmail.com wrote: 1024 for the number of file limit is most likely too small for Linux Machines on production. Try to set to 65536 or unlimited if you can. The too

[jira] [Created] (SPARK-3040) pick up a more proper local ip address for Utils.findLocalIpAddress method

2014-08-14 Thread Ye Xianjin (JIRA)
Ye Xianjin created SPARK-3040: - Summary: pick up a more proper local ip address for Utils.findLocalIpAddress method Key: SPARK-3040 URL: https://issues.apache.org/jira/browse/SPARK-3040 Project: Spark

Re: defaultMinPartitions in textFile

2014-07-21 Thread Ye Xianjin
the defaultParallelism is less than 2... -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, July 22, 2014 at 10:18 AM, Wang, Jensen wrote: Hi, I started to use spark on yarn recently and found a problem while tuning my program. When SparkContext is initialized

[jira] [Created] (SPARK-2557) createTaskScheduler should be consistent between local and local-n-failures

2014-07-17 Thread Ye Xianjin (JIRA)
Ye Xianjin created SPARK-2557: - Summary: createTaskScheduler should be consistent between local and local-n-failures Key: SPARK-2557 URL: https://issues.apache.org/jira/browse/SPARK-2557 Project: Spark

[jira] [Commented] (SPARK-2557) createTaskScheduler should be consistent between local and local-n-failures

2014-07-17 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065001#comment-14065001 ] Ye Xianjin commented on SPARK-2557: --- I will send a pr for this. createTaskScheduler

[jira] [Commented] (SPARK-2557) createTaskScheduler should be consistent between local and local-n-failures

2014-07-17 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065029#comment-14065029 ] Ye Xianjin commented on SPARK-2557: --- Github pr: https://github.com/apache/spark/pull

Re: Where to set proxy in order to run ./install-dev.sh for SparkR

2014-07-02 Thread Ye Xianjin
You can try setting your HTTP_PROXY environment variable. export HTTP_PROXY=host:port But I don't use maven. If the env variable doesn't work, please search google for maven proxy. I am sure there will be a lot of related results. Sent from my iPhone On 2014年7月2日, at 19:04, Stuti Awasthi

Re: Set comparison

2014-06-16 Thread Ye Xianjin
If you want string with quotes, you have to escape it with '\'. It's exactly what you did in the modified version. Sent from my iPhone On 2014年6月17日, at 5:43, SK skrishna...@gmail.com wrote: In Line 1, I have expected_res as a set of strings with quotes. So I thought it would include the

[jira] [Closed] (SPARK-1511) Update TestUtils.createCompiledClass() API to work with creating class file on different filesystem

2014-04-17 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Xianjin closed SPARK-1511. - Resolution: Fixed Fix Version/s: 1.0.0 Update TestUtils.createCompiledClass() API to work

[jira] [Created] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

2014-04-17 Thread Ye Xianjin (JIRA)
Ye Xianjin created SPARK-1527: - Summary: rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1 Key: SPARK-1527 URL: https://issues.apache.org/jira/browse/SPARK-1527 Project

[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

2014-04-17 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973087#comment-13973087 ] Ye Xianjin commented on SPARK-1527: --- Yes. You are right. toString() may give relative

[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

2014-04-17 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973096#comment-13973096 ] Ye Xianjin commented on SPARK-1527: --- Yes, of course, sometimes we want absolute path

[jira] [Updated] (SPARK-1511) Update TestUtils.createCompiledClass() API to work with creating class file on different filesystem

2014-04-16 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Xianjin updated SPARK-1511: -- Affects Version/s: 0.8.1 0.9.0 Update TestUtils.createCompiledClass() API

Re: Tests failed after assembling the latest code from github

2014-04-14 Thread Ye Xianjin
Thank you for your reply. After building the assembly jar, the repl test still failed. The error output is same as I post before. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, April 15, 2014 at 1:39 AM, Michael Armbrust wrote: I believe you may need

Re: Tests failed after assembling the latest code from github

2014-04-14 Thread Ye Xianjin
the TestUtils.scala to first copy the file to dest then delete the original file. The tests go smoothly. Should I issue an jira about this problem? Then I can send a pr on Github. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, April 15, 2014 at 3:43 AM, Ye Xianjin wrote