Github user justinuang closed the pull request at:
https://github.com/apache/spark/pull/23179
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user justinuang opened a pull request:
https://github.com/apache/spark/pull/23179
Fix the rat excludes on .policy.yml
## What changes were proposed in this pull request?
Fix the rat excludes on .policy.yml
You can merge this pull request into a Git repository
Github user justinuang commented on the issue:
https://github.com/apache/spark/pull/20877
Sorry, I won't be able to take it over!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user justinuang opened a pull request:
https://github.com/apache/spark/pull/23051
[AE2.3-02][SPARK-23128] Add QueryStage and the framework for adaptive
execution (auto setting the number of reducer)
## What changes were proposed in this pull request?
Add QueryStage
Github user justinuang closed the pull request at:
https://github.com/apache/spark/pull/23051
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user justinuang closed the pull request at:
https://github.com/apache/spark/pull/22968
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user justinuang opened a pull request:
https://github.com/apache/spark/pull/22968
Merge upstream
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22503#discussion_r226386187
--- Diff: sql/core/src/test/resources/test-data/cars-crlf.csv ---
@@ -0,0 +1,7 @@
+
+year,make,model,comment,blank
+"2012",
Github user justinuang commented on the issue:
https://github.com/apache/spark/pull/22503
done!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user justinuang commented on the issue:
https://github.com/apache/spark/pull/22503
So Hadoop's LineReader looks like it handles CR, LF, CRLF:
https://github.com/apache/hadoop/blob/f90c64e6242facf38c2baedeeda42e4a8293e642/hadoop-common-project/hadoop-common/src/main
Github user justinuang closed the pull request at:
https://github.com/apache/spark/pull/22680
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user justinuang opened a pull request:
https://github.com/apache/spark/pull/22680
[SPARK-25493][SQL] Use auto-detection for CRLF in CSV datasource multiline
mode
## Upstream SPARK-X ticket and PR link (if not applicable, explain)
Went through review
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22503#discussion_r222053706
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
---
@@ -212,6 +212,8 @@ class CSVOptions
Github user justinuang commented on the issue:
https://github.com/apache/spark/pull/22503
What does it take to get this to be merged in?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user justinuang commented on the issue:
https://github.com/apache/spark/pull/22503
Sounds good, thanks guys =)
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user justinuang commented on the issue:
https://github.com/apache/spark/pull/22503
It looks like a flake? Can someone retrigger it?
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96511/console
GitHub user justinuang opened a pull request:
https://github.com/apache/spark/pull/22503
[SPARK-25493] [SQL] Fix multiline crlf
## What changes were proposed in this pull request?
CSVs with windows style crlf (carriage return line feed) don't work in
multiline mode
Github user justinuang commented on the issue:
https://github.com/apache/spark/pull/19591
Really looking forward to this PR! For our use case, it will reduce our
spark launch times by ~4 seconds
Github user justinuang commented on the issue:
https://github.com/apache/spark/pull/15009
That would be incredible. Launching a new jvm and loading all of hadoop
takes about 4 seconds extra each time, versus reusing the launcher jvm, which
is really significant for us since we launch
Github user justinuang commented on the issue:
https://github.com/apache/spark/pull/15009
@kishorvpatil this will be quite useful for us! To avoid the 3s cost of
spinning up a new jvm just for yarn-cluster
---
If your project is set up for it, you can reply to this email and have
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r41220284
--- Diff: python/setup.py ---
@@ -0,0 +1,18 @@
+#!/usr/bin/env python
+
+from setuptools import setup
+
+exec(compile(open("py
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r41221064
--- Diff: python/pyspark/pyspark_version.py ---
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-144187766
Thanks for the reminder!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user justinuang closed the pull request at:
https://github.com/apache/spark/pull/8662
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8833#discussion_r40048503
--- Diff: python/pyspark/sql/functions.py ---
@@ -1414,7 +1414,7 @@ def __init__(self, func, returnType, name=None):
def _create_judf(self, name
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8833#issuecomment-142161491
lgtm! So this avoids deadlock by getting rid of the blocking queue (duh!)
and then assumes the OS buffer will rate limit how much gets buffered on the
writer side
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8833#discussion_r39933648
--- Diff: python/pyspark/sql/functions.py ---
@@ -1414,7 +1414,7 @@ def __init__(self, func, returnType, name=None):
def _create_judf(self, name
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-141121225
@rxin what do you mean by local iterators =) I feel like i'm missing some
context that you guys have
---
If your project is set up for it, you can reply
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-141117878
The solution with the iterator wrapper was my first approach that I
prototyped
(http://apache-spark-developers-list.1001551.n3.nabble.com/Python-UDF-performance
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-141232211
I'm not sure there is a solution that satisfies all the requirements. I can
say that this approach addresses 1,2,4 by design.
Would you guys support a 1.6.0
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8318#issuecomment-140871937
Thanks! Sorry for being demanding, was just hoping to get this into 1.6.0!
---
If your project is set up for it, you can reply to this email and have your
reply
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-140920743
@davies how do I have a private class in python?
In addition, is it possible that the failing unit test is flaky? I ran
./run-tests --python
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-140936346
Hey davies, I think the performance regression for a single UDF may be
because there were multiple threads per task that could potentially be taking
up CPU time. I
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8318#issuecomment-140866466
What is this blocking on?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-140413982
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-140416126
@rxin or @davies why is this automatically not retriggering when i push a
new commit? Also, looks like the "retest this please" only works with
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-140223207
Looks like your intuition was right. The second time it's slightly faster,
so I ran the loop twice and took the 2nd's numbers
Here are the updated numbers
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-140181530
Sorry for the delay, here is the code I ran and here are the results
from pyspark.sql.functions import udf
from pyspark.sql.types import
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-139331466
Hey davies, I don't have any numbers. Are there any benchmarks that we can
just rerun?
---
If your project is set up for it, you can reply to this email and have
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-139332688
Is there an example of another benchmark? I'm not sure where they're stored
for python
---
If your project is set up for it, you can reply to this email and have
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-139023500
Should the build have started by now?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user justinuang opened a pull request:
https://github.com/apache/spark/pull/8662
[SPARK-8632] [SQL] [PYSPARK] Poor Python UDF performance because of Râ¦
â¦DD caching
- I wanted to reuse most of the logic from PythonRDD, so I pulled out
two methods
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8662#issuecomment-138758861
@davies @JoshRosen @rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8615#issuecomment-138326562
I think we are missing some of the references to 0.8.2.1
git grep py4j-
LICENSE:For Py4J (python/lib/py4j-0.8.2.1-src.zip)
bin
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r37563567
--- Diff: python/pyspark/__init__.py ---
@@ -36,6 +36,31 @@
Finer-grained cache persistence levels.
+import os
+import sys
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r37524370
--- Diff: python/setup.py ---
@@ -0,0 +1,19 @@
+#!/usr/bin/env python
+
+from setuptools import setup
+
+exec(compile(open(pyspark
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r37524804
--- Diff: python/pyspark/pyspark_version.py ---
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/8318#issuecomment-132998065
@holdenk , thanks for working on this! Do we have plans to set up PyPI
publishing?
---
If your project is set up for it, you can reply to this email and have your
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r37570377
--- Diff: python/pyspark/__init__.py ---
@@ -36,6 +36,31 @@
Finer-grained cache persistence levels.
+import os
+import sys
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r37570006
--- Diff: python/pyspark/__init__.py ---
@@ -36,6 +36,31 @@
Finer-grained cache persistence levels.
+import os
+import sys
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r37570574
--- Diff: python/pyspark/pyspark_version.py ---
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r37459103
--- Diff: python/pyspark/__init__.py ---
@@ -36,6 +36,33 @@
Finer-grained cache persistence levels.
+import os
+import sys
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r37458924
--- Diff: python/setup.py ---
@@ -0,0 +1,19 @@
+#!/usr/bin/env python
+
+from setuptools import setup
+
+exec(compile(open(pyspark
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r37459213
--- Diff: python/pyspark/__init__.py ---
@@ -36,6 +36,33 @@
Finer-grained cache persistence levels.
+import os
+import sys
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/6439#discussion_r31166415
--- Diff: python/run-tests ---
@@ -57,54 +57,54 @@ function run_test() {
function run_core_tests() {
echo Run core tests
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/6439#discussion_r31169697
--- Diff: python/run-tests ---
@@ -57,54 +57,54 @@ function run_test() {
function run_core_tests() {
echo Run core tests
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/5873#issuecomment-98541714
You can consider using set equality for the test, but other than that, it
looks good! Thanks!
---
If your project is set up for it, you can reply to this email
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/5601#issuecomment-98386185
Yea, you should try rebasing. It looks like you're not the only one running
into this.
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder
Github user justinuang commented on a diff in the pull request:
https://github.com/apache/spark/pull/5601#discussion_r29482941
--- Diff: python/pyspark/ml/tuning.py ---
@@ -0,0 +1,94 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user justinuang commented on the pull request:
https://github.com/apache/spark/pull/3173#issuecomment-74416823
Hi, this looks great! Is there a reason why sort based join is not in spark
core, only in spark SQL?
---
If your project is set up for it, you can reply
60 matches
Mail list logo