Re: XPATH_INT behavior - XML - Function in Spark

2020-05-13 Thread Chetan Khatri
Anyone can please suggest how can I achieve this? On Tue, May 12, 2020 at 5:35 PM Jeff Evans wrote: > It sounds like you're expecting the XPath expression to evaluate embedded > Spark SQL expressions? From the documentation > ,

Huge difference in speed between pyspark and scalaspark

2020-05-13 Thread Steven Van Ingelgem
Public Hello all, We noticed a HUGE difference between using pyspark and spark in scala. Pyspark runs: * on my work computer in +-350 seconds * on my home computer in +- 130 seconds (Windows defender enabled) * on my home computer in +- 105 seconds (Windows defender disabled) *

Re: [PySpark] Tagging descriptions

2020-05-13 Thread ZHANG Wei
AFAICT, from the data size (25B rows, key cell 300 chars string), looks like a common Spark job. But the regex might be complex, I guess there are lots of items to match as (apple|banana|cola|...) from the purchase list. Regex matching is a high CPU computing task. If the current performance with