Spark ML on Python has short memory?

2019-03-16 Thread Saif Addin
Hi, We're working on Spark NLP by including multiple ML Estimators and Transformers. Getting a negative performance hit on Python side, because of the columns being recalculated recursively (and more than recursively) on each stage.transform() call. I am not being able to trace the root of the

Spark Streaming: schema mismatch using MicroBatchReader with columns pruning

2019-03-16 Thread kineret M
I have the same problem as described in the following question in StackOverflow (but nobody has answered to it). https://stackoverflow.com/questions/51103634/spark-streaming-schema-mismatch-using-microbatchreader-with-columns-pruning Any idea of how to solve it (using Spark 2.3)? Thanks,

Masking username in Spark with regexp_replace and reverse functions

2019-03-16 Thread Mich Talebzadeh
Hi, I am looking at Description column of a bank statement (CSV download) that has the following format scala> account_table.printSchema root |-- TransactionDate: date (nullable = true) |-- TransactionType: string (nullable = true) |-- Description: string (nullable = true) |-- Value: double