[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2018-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17671
  
👍 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2018-02-28 Thread kxepal
Github user kxepal commented on the issue:

https://github.com/apache/spark/pull/17671
  
@holdenk 
mmm...sweet! That may work and even makes integration process more 
flexible. Sentry integration wrapper would be trivial with this feature. 
Thanks! 

For the future reference: 
https://github.com/apache/spark/commit/afae8f2bc82597593595af68d1aa2d802210ea8b


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2018-02-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17671
  
So now that we have the configurable worker module I think this is no 
longer needed to be integrated in this way.

I'm not sure if Sentry support directly belongs in but at the very least 
this could be simplified.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17671
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-15 Thread tedmiston
Github user tedmiston commented on the issue:

https://github.com/apache/spark/pull/17671
  
I checked the mailing list archives but could not find a recent thread on 
Sentry.  Anyone know if there is one I'm missing?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17671
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread kxepal
Github user kxepal commented on the issue:

https://github.com/apache/spark/pull/17671
  
Ok, will do. Thanks @HyukjinKwon .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17671
  
Ah, yes. There are good places to discuss. For example, see "Mailing List" 
- https://spark.apache.org/community.html. I also want to actually see if 
others think about this.

I have done this few times. Please refer, for example, [`[discuss][PySpark] 
Can we drop support old Pandas (<0.19.2) or what version should we 
support?`](http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-PySpark-Can-we-drop-support-old-Pandas-lt-0-19-2-or-what-version-should-we-support-td22834.html)
 and [`[discuss][SQL] Partitioned column type inference 
proposal`](http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-SQL-Partitioned-column-type-inference-proposal-td22838.html).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread kxepal
Github user kxepal commented on the issue:

https://github.com/apache/spark/pull/17671
  
@HyukjinKwon 
Sorry, I'm a bit lost. What email? With a link to this PR to gather more 
opinions? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17671
  
I am a Python developer and I use PySpark in production. @kxepal, would you 
mind sending an email for this discussion in the dev mailing list?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread kxepal
Github user kxepal commented on the issue:

https://github.com/apache/spark/pull/17671
  
@HyukjinKwon 
The specific reason is to simplify debugging and errors understanding by 
integration pyspark with one of the most popular error tracking system among 
Python developers. E.g. improve user experience. 

It's not a maintenance stuff, since you never know when and how your 
production will crash and could you even reproduce that issue to track down and 
fix the bug. You would like to have this integration on all the time.

What you purpose is to do that stuff on application side. How many UDFs I 
should rewrap to make it works? How many times I should tell newcomers about 
this custom magic? How many times I should copy-paste that solution between the 
projects? I think that way doesn't scale well and brings no fun to pyspark 
development. Especially, when you can do that once on pyspark side with no cost.

Could a try that patch with Sentry change your mind about?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17671
  
Also, there are many good tools Python debuggers, coverage and etc. I think 
providing a mechanism for such things is a great idea if possible but I doubt 
each support should be added while workarounds are relatively easy and possible.

I personally don't like monkey-patch in general but it's good for working 
around in few specific cases and think it's fine if it is not excessive and is 
well understood. PySpark also uses it in few cases.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17671
  
What's the specific reasons then? I think it's basically a maintenance 
stuff. I don't think the changes diff will be largely different when it's 
handled application side.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread kxepal
Github user kxepal commented on the issue:

https://github.com/apache/spark/pull/17671
  
> If this is the reason to add the support of thirdparty library, it sounds 
not quite compelling. I think you can even just simply monkey-patch udf or 
UserDefinedFunction. It wouldn't be too difficult.

No, the main reason is to greatly improve debugging experience for pyspark 
UDFs without a lot of code change. Pyspark worker is a perfect place to handle 
whose errors. 

I don't think that monkey-patch is a good way to go. It's basically, 
hackery, which is unstable and can be eventually broken. And you'll have to 
copy-paste it from project to project to have good error reporting. 

Compare this with simply install debugger package on worker side (raven for 
this PR) and pass at least one configuration option via SparkConfig - that's 
enough to let all your errors being caught.

> I wonder if we could maybe make a mechanism for this that would be useful 
beyond just sentry but also things like connecting Python debuggers

That would be a great, but I'm not familiar with the others error 
management systems like Sentry. We can start with the few now (Sentry will 
cover most of Python users) and then figure something else. Like plugins via 
entry points which are provided by setuptools / pkg_resources - in this case 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17671
  
Yea, having a mechanism for things like this much more makes sense to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17671
  
So we could do the monkey patching for UDFs but for 
reduce/aggregate/combineByKey it starts to get a little rough to do.

I wonder if we could maybe make a mechanism for this that would be useful 
beyond just sentry but also things like connecting Python debuggers (since we 
don't have the same JMX stuff)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17671
  
To me, for the similar reason - 
https://github.com/apache/spark/pull/17671#pullrequestreview-81100230, I think 
I am less sure of adding this. 

For the workaround you described @kxepal - 
https://github.com/apache/spark/pull/17671#issuecomment-298768608, 

> This was found as very inconvenient way since you'll have to always wrap 
all your functions. Easy to forget to do.

If this is the reason to add the support of thirdparty library support, it 
sounds not quite compelling. I think you can even just simply monkey-patch 
`udf` or `UserDefinedFunction`. It wouldn't be too difficult. 

This gives me an impression that there are many easy workarounds and I 
would like to avoid adding overhead of maintaining such cases, but rather 
focuses on more core stuff.

It might be best to just share documentation and the monkey patches to 
mailing list.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17671
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17671
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84476/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17671
  
**[Test build #84476 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84476/testReport)**
 for PR 17671 at commit 
[`3a3301e`](https://github.com/apache/spark/commit/3a3301e820d65524b82024c04c339298eba5071c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17671
  
**[Test build #84476 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84476/testReport)**
 for PR 17671 at commit 
[`3a3301e`](https://github.com/apache/spark/commit/3a3301e820d65524b82024c04c339298eba5071c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17671
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17671
  
Thanks @holdenk. Sure, I am interested in this. Will take a look shortly 
within tomorrow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17671
  
Jenkins OK to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-05 Thread kxepal
Github user kxepal commented on the issue:

https://github.com/apache/spark/pull/17671
  
Rebased to resolve conflicts.

@holdenk could you take a look please? Is there need something else to do?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-12-04 Thread tedmiston
Github user tedmiston commented on the issue:

https://github.com/apache/spark/pull/17671
  
Anyone know what the status of this PR is?  I'd really like to be able to 
run Sentry on my PySpark code in production.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-05-02 Thread kxepal
Github user kxepal commented on the issue:

https://github.com/apache/spark/pull/17671
  
Hm...I'd read about broadcast variables, but never tried to use them. 
However, after quick look and try, I found that this won't change things too 
much. 

Yes, you will be able to pass client instance to all the executors, but 
still you'll have to modify all the UDF and rest functions to capture exception 
with sentry client by wrapping all the body with `try: ... except: 
raven_client.captureException()`. And if we have a lambdas, we'll have to 
rewrite those completely. 

In the best case, this could be reduced to some decorator which will take 
care about all these routines, but still you'll have to remember to use it all 
the time. And also, you can easily hit the same issue I did by using default 
threaded Sentry client transport - in some cases it isn't able to send an 
exception to service before pyspark.worker calls `sys.exit(1)`. Such gotchas 
quite hard to catch.

This way may be good from point of, say, design, but the goal to simplify 
pyspark developing experience will not be reached in this case. Well, at least, 
we can have it better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-05-02 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17671
  
Thanks for working on improving the PySpark debugging experience. Perhaps a 
similar effect be achieved by adding a broadcast variable which lazily loads 
sentry?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

2017-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17671
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org