Re: Spark ml how to extract split points from trained decision tree mode

2020-06-11 Thread AaronLee
@srowen. You are totally right, the model was not trained correctly. But it is weird as the dataset I used actually has 50m rows. It has binary label with 20% positive, and 1 feature in feature vector. Do not understand why it does not trained correctly ``` scala> df2.count res56: Long =

Re: Spark ml how to extract split points from trained decision tree mode

2020-06-11 Thread Sean Owen
Hm, the root is a leaf? it's possible but that means there are no splits. If it's a toy example, could be. This was just off the top of my head looking at the code, so could be missing something, but a non-trivial tree should start with an internalnode. On Thu, Jun 11, 2020 at 11:01 PM AaronLee

Re: Spark ml how to extract split points from trained decision tree mode

2020-06-11 Thread AaronLee
Thanks srowen. I also checked https://www.programcreek.com/scala/org.apache.spark.ml.tree.InternalNode. Splits are available via "InternalNode" ".split" attribute. But "dtm.rootNode" belongs to "LeafNode". ``` scala> dtm.rootNode res9: org.apache.spark.ml.tree.Node = LeafNode(prediction = 0.0,

Re: Unsubscribe martha focker

2020-06-11 Thread hashbonduo
When these Matha Fockers don't even know how to unsubscribe. What hope of them becoming data scientist ? I mean first you have to train on some maths.Algebra statistics calculus from people who have no idea of data science or machine learning. Classifier algorithms , recommended

Re: Spark ml how to extract split points from trained decision tree mode

2020-06-11 Thread Sean Owen
You should be able to look at dtm.rootNode and, treating it as an InternalNode, get the .split from it On Thu, Jun 11, 2020 at 7:02 PM AaronLee wrote: > I am following official spark 2.4.3 tutorial > < >

Spark ml how to extract split points from trained decision tree mode

2020-06-11 Thread AaronLee
I am following official spark 2.4.3 tutorial trained a decision tree model. How to extract split points from the trained model? // model val dt = new DecisionTreeClassifier()

[External] Unsubscribe

2020-06-11 Thread Mishra, Dhiraj A.
Thanks, Dhiraj This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of

Unsubscribe

2020-06-11 Thread Angel Angel

Re: Broadcast join data reuse

2020-06-11 Thread Ankur Srivastava
Hi Tyson, The broadcast variable should remain in-memory of the executors and reused unless you unpersist, destroy it or it goes out of context. Hope this helps. Thanks Ankur On Wed, Jun 10, 2020 at 5:28 PM wrote: > We have a case where data the is small enough to be broadcasted in joined >

Re: Arrow RecordBatches/Pandas Dataframes to (Arrow enabled) Spark Dataframe conversion in streaming fashion

2020-06-11 Thread Tanveer Ahmad - EWI
Hi Jorge, Thank you. This union function is better alternative for my work. Regards, Tanveer Ahmad From: Jorge Machado Sent: Monday, May 25, 2020 3:56:04 PM To: Tanveer Ahmad - EWI Cc: Spark Group Subject: Re: Arrow RecordBatches/Pandas Dataframes to (Arrow