+1 Sent from my iPhoneOn Apr 30, 2024, at 3:23 PM, DB Tsai wrote:+1 On Apr 29, 2024, at 8:01 PM, Wenchen Fan wrote:To add more color:Spark data source table and Hive Serde table are both stored in the Hive metastore and keep the data files in the table directory. The only difference is they
Congratulations!Sent from my iPhoneOn Aug 7, 2023, at 11:16 AM, Yuming Wang wrote:Congratulations!On Mon, Aug 7, 2023 at 11:11 AM Kent Yao wrote:Congrats! Peter and Xiduo!
Cheng Pan 于2023年8月7日周一 11:01写道:
>
> Congratulations! Peter and Xiduo!
>
> Thanks,
>
The configuration of ‘…file.upload.path’ is wrong. it means a distributed fs path to store your archives/resource/jars temporarily, then distributed by spark to drivers/executors. For your cases, you don’t need to set this configuration.Sent from my iPhoneOn Feb 14, 2023, at 5:43 AM, karan alang
+1 (no-binding)
Sent from my iPhone
> On May 31, 2022, at 10:46 AM, Aloys Zhang wrote:
>
> +1 (no-binding)
>
> XiaoYu 于2022年5月31日周二 10:12写道:
>
>> +1 (no-binding)
>>
>> Xun Liu 于2022年5月31日周二 10:07写道:
>>>
>>> +1 (binding) for me.
>>>
>>> Good luck!
>>>
>>> On Tue, May 31, 2022 at 10:04
+1 (non-binding).
Sent from my iPhone
> On May 25, 2022, at 9:59 AM, Goson zhang wrote:
>
> +1 (non-binding)
>
> Good luck!
>
> Daniel Widdis 于2022年5月25日周三 09:53写道:
>
>> +1 (non-binding) from me! Good luck!
>>
>> On 5/24/22, 9:05 AM, "Jerry Shao" wrote:
>>
>>Hi all,
>>
>>Due
> For that, you can add a table subquery and do it in the select list.
Do you mean something like this:
select * from t1 join (select floor(random()*9) + id as x from t2) m on t1.id =
m.x ?
Yes, that works. But that raise another question: theses two queries seem
semantically equivalent, yet
Hi,
Thanks for Ryan and Wenchen for leading this.
I’d like to add my two cents here. In production environments, the function
catalog might be used by multiple systems, such as Spark, Presto and Hive. Is
it possible that this function catalog is designed with as an unified function
catalog
Congratulations!
Sent from my iPhone
> On Sep 10, 2019, at 9:19 AM, Jeff Zhang wrote:
>
> Congratulations!
>
> Saisai Shao 于2019年9月10日周二 上午9:16写道:
>> Congratulations!
>>
>> Jungtaek Lim 于2019年9月9日周一 下午6:11写道:
>>> Congratulations! Well deserved!
>>>
On Tue, Sep 10, 2019 at 9:51 AM
Hi AlexG:
Files(blocks more specifically) has 3 copies on HDFS by default. So 3.8 * 3 =
11.4TB.
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, November 25, 2015 at 2:31 PM, AlexG wrote:
> I downloaded a 3.8 T dataset from S3 to a freshly launched sp
.
This is my calculation based on the spark SizeEstimator.
However I am not sure what an Integer will occupy for 64 bits JVM with
compressedOps on. It should be 12 + 4 = 16 bytes, then that means the
SizeEstimator gives the wrong result. @Sean what do you think?
--
Ye Xianjin
Sent with Sparrow
Congratulations!
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, February 4, 2015 at 6:34 AM, Matei Zaharia wrote:
Hi all,
The PMC recently voted to add three new committers: Cheng Lian, Joseph
Bradley and Sean Owen. All three have been major
[
https://issues.apache.org/jira/browse/SPARK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296811#comment-14296811
]
Ye Xianjin commented on SPARK-4631:
---
[~dragos], Thread.sleep(50) do pass the test on my
[
https://issues.apache.org/jira/browse/SPARK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ye Xianjin updated SPARK-4631:
--
Comment: was deleted
(was: [~dragos], Thread.sleep(50) do pass the test on my machine. )
Add real
[
https://issues.apache.org/jira/browse/SPARK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296812#comment-14296812
]
Ye Xianjin commented on SPARK-4631:
---
[~dragos], Thread.sleep(50) do pass the test on my
Sean,
the MQRRStreamSuite is also failed for me on Mac OS X, Though I don’t have time
to invest that.
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, January 28, 2015 at 9:17 PM, Sean Owen wrote:
+1 (nonbinding). I verified that all the hash / signing
[
https://issues.apache.org/jira/browse/SPARK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296232#comment-14296232
]
Ye Xianjin commented on SPARK-4631:
---
Hi [~dragos], I have the same issue here. I'd like
There is no binding issue here. Spark picks the right ip 10.211.55.3 for you.
The printed message is just an indication.
However I have no idea why spark-shell hangs or stops.
发自我的 iPhone
在 2015年1月14日,上午5:10,Akhil Das ak...@sigmoidanalytics.com 写道:
It just a binding issue with the
Ye Xianjin created SPARK-5201:
-
Summary: ParallelCollectionRDD.slice(seq, numSlices) has int
overflow when dealing with inclusive range
Key: SPARK-5201
URL: https://issues.apache.org/jira/browse/SPARK-5201
[
https://issues.apache.org/jira/browse/SPARK-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273277#comment-14273277
]
Ye Xianjin commented on SPARK-5201:
---
I will send a pr
: unresolved dependency:
org.apache.kafka#kafka_2.11;0.8.0: not found
[error] (catalyst/*:update) sbt.ResolveException: unresolved dependency:
org.scalamacros#quasiquotes_2.11;2.0.1: not found
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Tuesday, November 18, 2014
[
https://issues.apache.org/jira/browse/FLUME-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205891#comment-14205891
]
Ye Xianjin commented on FLUME-2385:
---
hi, [~scaph01], I think (according to my colleague
[
https://issues.apache.org/jira/browse/SPARK-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179753#comment-14179753
]
Ye Xianjin commented on SPARK-4002:
---
Hi, [~rdub] what's your mac os x's hostname ? Mine
, the SparkConf will throw
SparkException.
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Tuesday, September 23, 2014 at 12:56 AM, Ye Xianjin wrote:
Hi:
I notice the scalatest-maven-plugin set SPARK_CLASSPATH environment
variable for testing. But in the SparkConf.scala
not be used any more. But
I don't think it's worthy to file a JIRA for such small change. Maybe put it
into other related JIRA. It's a pity that your pr
already got merged.
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Friday, September 26, 2014 at 6:29 AM, Sandy Ryza
Hi:
I notice the scalatest-maven-plugin set SPARK_CLASSPATH environment
variable for testing. But in the SparkConf.scala, this is deprecated in Spark
1.0+.
So what this variable for? should we just remove this variable?
--
Ye Xianjin
Sent with Sparrow (http
`
fi
I wanted to file a jira to http://jira.codehaus.org
(http://jira.codehaus.org/). But it seems it's not open for registration. So I
think maybe it's a good idea to send an email here.
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
Great. And you should ask question in user@spark.apache.org mail list. I
believe many people don't subscribe the incubator mail list now.
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, September 10, 2014 at 6:03 PM, redocpot wrote:
Hi,
I am using
| Do the two mailing lists share messages ?
I don't think so. I didn't receive this message from the user list. I am not
in databricks, so I can't answer your other questions. Maybe Davies Liu
dav...@databricks.com can answer you?
--
Ye Xianjin
Sent with Sparrow (http
Well, That's weird. I don't see this thread in my mail box as sending to user
list. Maybe because I also subscribe the incubator mail list? I do see mails
sending to incubator mail list and no one replies. I thought it was because
people don't subscribe the incubator now.
--
Ye Xianjin
Sent
Can you provide small sample or test data that reproduce this problem? and
what's your env setup? single node or cluster?
Sent from my iPhone
On 2014年9月8日, at 22:29, redocpot julien19890...@gmail.com wrote:
Hi,
I have a key-value RDD called rdd below. After a groupBy, I tried to count
what did you see in the log? was there anything related to mapreduce?
can you log into your hdfs (data) node, use jps to list all java process and
confirm whether there is a tasktracker process (or nodemanager) running with
datanode process
--
Ye Xianjin
Sent with Sparrow (http
):
org.apache.hadoop.hdfs.server.datanode.DataNode
On Mon, Sep 8, 2014 at 6:39 PM, Ye Xianjin advance...@gmail.com wrote:
what did you see in the log? was there anything related to mapreduce?
can you log into your hdfs (data) node, use jps to list all java process and
confirm whether there is a tasktracker
no comments.
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Tuesday, September 2, 2014 at 4:58 PM, Sean Owen wrote:
No, usually you unit-test your changes during development. That
doesn't require the assembly. Eventually you may wish to test some
change against the complete
[
https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117558#comment-14117558
]
Ye Xianjin commented on SPARK-3098:
---
hi, [~srowen] and [~gq], I think what [~matei
We just used CDH 4.7 for our production cluster. And I believe we won't use CDH
5 in the next year.
Sent from my iPhone
On 2014年8月29日, at 14:39, Matei Zaharia matei.zaha...@gmail.com wrote:
Personally I'd actually consider putting CDH4 back if there are still users
on it. It's always
need to change this limit on all
the cluster nodes or just the master?
Thanks
On Aug 29, 2014 11:43 AM, Ye Xianjin advance...@gmail.com wrote:
1024 for the number of file limit is most likely too small for Linux
Machines on production. Try to set to 65536 or unlimited if you can. The too
Ye Xianjin created SPARK-3040:
-
Summary: pick up a more proper local ip address for
Utils.findLocalIpAddress method
Key: SPARK-3040
URL: https://issues.apache.org/jira/browse/SPARK-3040
Project: Spark
the defaultParallelism is less
than 2...
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Tuesday, July 22, 2014 at 10:18 AM, Wang, Jensen wrote:
Hi,
I started to use spark on yarn recently and found a problem while
tuning my program.
When SparkContext is initialized
Ye Xianjin created SPARK-2557:
-
Summary: createTaskScheduler should be consistent between local
and local-n-failures
Key: SPARK-2557
URL: https://issues.apache.org/jira/browse/SPARK-2557
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065001#comment-14065001
]
Ye Xianjin commented on SPARK-2557:
---
I will send a pr for this.
createTaskScheduler
[
https://issues.apache.org/jira/browse/SPARK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065029#comment-14065029
]
Ye Xianjin commented on SPARK-2557:
---
Github pr: https://github.com/apache/spark/pull
You can try setting your HTTP_PROXY environment variable.
export HTTP_PROXY=host:port
But I don't use maven. If the env variable doesn't work, please search google
for maven proxy. I am sure there will be a lot of related results.
Sent from my iPhone
On 2014年7月2日, at 19:04, Stuti Awasthi
If you want string with quotes, you have to escape it with '\'. It's exactly
what you did in the modified version.
Sent from my iPhone
On 2014年6月17日, at 5:43, SK skrishna...@gmail.com wrote:
In Line 1, I have expected_res as a set of strings with quotes. So I thought
it would include the
[
https://issues.apache.org/jira/browse/SPARK-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ye Xianjin closed SPARK-1511.
-
Resolution: Fixed
Fix Version/s: 1.0.0
Update TestUtils.createCompiledClass() API to work
Ye Xianjin created SPARK-1527:
-
Summary: rootDirs in DiskBlockManagerSuite doesn't get full path
from rootDir0, rootDir1
Key: SPARK-1527
URL: https://issues.apache.org/jira/browse/SPARK-1527
Project
[
https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973087#comment-13973087
]
Ye Xianjin commented on SPARK-1527:
---
Yes. You are right. toString() may give relative
[
https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973096#comment-13973096
]
Ye Xianjin commented on SPARK-1527:
---
Yes, of course, sometimes we want absolute path
[
https://issues.apache.org/jira/browse/SPARK-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ye Xianjin updated SPARK-1511:
--
Affects Version/s: 0.8.1
0.9.0
Update TestUtils.createCompiledClass() API
Thank you for your reply.
After building the assembly jar, the repl test still failed. The error output
is same as I post before.
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Tuesday, April 15, 2014 at 1:39 AM, Michael Armbrust wrote:
I believe you may need
the TestUtils.scala to first copy the file to dest then delete the
original file. The tests go smoothly.
Should I issue an jira about this problem? Then I can send a pr on Github.
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Tuesday, April 15, 2014 at 3:43 AM, Ye Xianjin wrote
50 matches
Mail list logo