[jira] [Comment Edited] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
[ https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821639#comment-16821639 ] Praveen edited comment on SPARK-27465 at 4/19/19 10:16 AM: --- Hi Shahid, Can you please let me know if you have any update on this issue? Seems the Error is coming because the Jar kafka_2.11-0.11.0.0.jar is not having the Time Class in it. Do you suggest how can we get it? was (Author: jastipraveen): Hi Shahid, Can you please let me know if you have any update on this issue? Seems the Error is coming because the Jar kafka_2.11-0.11.0.0.jar i not having the Time Class in it. Do you suggest how can we get it? > Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package > -- > > Key: SPARK-27465 > URL: https://issues.apache.org/jira/browse/SPARK-27465 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Praveen >Priority: Critical > > Hi Team, > We are getting the below exceptions with Kafka Client Version 0.11.0.0 for > KafkaTestUtils Package. But its working fine when we use the Kafka Client > Version 0.10.0.1. Please suggest the way forwards. We are using the package " > org.apache.spark.streaming.kafka010.KafkaTestUtils;" > And the Spark Streaming Version is 2.2.3 and above. > > ERROR: > java.lang.NoSuchMethodError: > kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time; > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122) > at > com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203) > at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157) > at > com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
[ https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821639#comment-16821639 ] Praveen edited comment on SPARK-27465 at 4/19/19 9:39 AM: -- Hi Shahid, Can you please let me know if you have any update on this issue? Seems the Error is coming because the Jar kafka_2.11-0.11.0.0.jar i not having the Time Class in it. Do you suggest how can we get it? was (Author: jastipraveen): Hi Shahid, Can you please let me know if you have any update on this issue? > Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package > -- > > Key: SPARK-27465 > URL: https://issues.apache.org/jira/browse/SPARK-27465 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Praveen >Priority: Critical > > Hi Team, > We are getting the below exceptions with Kafka Client Version 0.11.0.0 for > KafkaTestUtils Package. But its working fine when we use the Kafka Client > Version 0.10.0.1. Please suggest the way forwards. We are using the package " > org.apache.spark.streaming.kafka010.KafkaTestUtils;" > And the Spark Streaming Version is 2.2.3 and above. > > ERROR: > java.lang.NoSuchMethodError: > kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time; > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122) > at > com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203) > at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157) > at > com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
[ https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821639#comment-16821639 ] Praveen commented on SPARK-27465: - Hi Shahid, Can you please let me know if you have any update on this issue? > Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package > -- > > Key: SPARK-27465 > URL: https://issues.apache.org/jira/browse/SPARK-27465 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Praveen >Priority: Critical > > Hi Team, > We are getting the below exceptions with Kafka Client Version 0.11.0.0 for > KafkaTestUtils Package. But its working fine when we use the Kafka Client > Version 0.10.0.1. Please suggest the way forwards. We are using the package " > org.apache.spark.streaming.kafka010.KafkaTestUtils;" > And the Spark Streaming Version is 2.2.3 and above. > > ERROR: > java.lang.NoSuchMethodError: > kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time; > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122) > at > com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203) > at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157) > at > com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
[ https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Praveen updated SPARK-27465: Description: Hi Team, We are getting the below exceptions with Kafka Client Version 0.11.0.0 for KafkaTestUtils Package. But its working fine when we use the Kafka Client Version 0.10.0.1. Please suggest the way forwards. We are using the package " org.apache.spark.streaming.kafka010.KafkaTestUtils;" And the Spark Streaming Version is 2.2.3 and above. ERROR: java.lang.NoSuchMethodError: kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time; at org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110) at org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226) at org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107) at org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122) at com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203) at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157) at com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58) was: Hi Team, We are getting the below exceptions with Kafka Client Version 0.11.0.0 for KafkaTestUtils Package. But its working fine when we use the Kafka Client Version 0.10.0.1. Please suggest the way forwards. We are using the package " import org.apache.spark.streaming.kafka010.KafkaTestUtils;" ERROR: java.lang.NoSuchMethodError: kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time; at org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110) at org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226) at org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107) at org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122) at com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203) at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157) at com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58) > Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package > -- > > Key: SPARK-27465 > URL: https://issues.apache.org/jira/browse/SPARK-27465 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Praveen >Priority: Critical > > Hi Team, > We are getting the below exceptions with Kafka Client Version 0.11.0.0 for > KafkaTestUtils Package. But its working fine when we use the Kafka Client > Version 0.10.0.1. Please suggest the way forwards. We are using the package " > org.apache.spark.streaming.kafka010.KafkaTestUtils;" > And the Spark Streaming Version is 2.2.3 and above. > > ERROR: > java.lang.NoSuchMethodError: > kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time; > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122) > at > com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203) > at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157) > at > com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
[ https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Praveen updated SPARK-27465: Affects Version/s: 2.3.0 2.3.1 2.3.2 2.4.0 2.4.1 > Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package > -- > > Key: SPARK-27465 > URL: https://issues.apache.org/jira/browse/SPARK-27465 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Praveen >Priority: Critical > > Hi Team, > We are getting the below exceptions with Kafka Client Version 0.11.0.0 for > KafkaTestUtils Package. But its working fine when we use the Kafka Client > Version 0.10.0.1. Please suggest the way forwards. We are using the package " > import org.apache.spark.streaming.kafka010.KafkaTestUtils;" > > ERROR: > java.lang.NoSuchMethodError: > kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time; > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122) > at > com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203) > at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157) > at > com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
[ https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Praveen updated SPARK-27465: Priority: Critical (was: Major) > Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package > -- > > Key: SPARK-27465 > URL: https://issues.apache.org/jira/browse/SPARK-27465 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.3.3 >Reporter: Praveen >Priority: Critical > > Hi Team, > We are getting the below exceptions with Kafka Client Version 0.11.0.0 for > KafkaTestUtils Package. But its working fine when we use the Kafka Client > Version 0.10.0.1. Please suggest the way forwards. We are using the package " > import org.apache.spark.streaming.kafka010.KafkaTestUtils;" > > ERROR: > java.lang.NoSuchMethodError: > kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time; > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107) > at > org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122) > at > com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203) > at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157) > at > com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
Praveen created SPARK-27465: --- Summary: Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package Key: SPARK-27465 URL: https://issues.apache.org/jira/browse/SPARK-27465 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 2.3.3 Reporter: Praveen Hi Team, We are getting the below exceptions with Kafka Client Version 0.11.0.0 for KafkaTestUtils Package. But its working fine when we use the Kafka Client Version 0.10.0.1. Please suggest the way forwards. We are using the package " import org.apache.spark.streaming.kafka010.KafkaTestUtils;" ERROR: java.lang.NoSuchMethodError: kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time; at org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110) at org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226) at org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107) at org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122) at com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203) at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157) at com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26738) Pyspark random forest classifier feature importance with column names
[ https://issues.apache.org/jira/browse/SPARK-26738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Praveen updated SPARK-26738: Description: I am trying to plot the feature importances of random forest classifier with with column names. I am using Spark 2.3.2 and Pyspark. The input X is sentences and i am using tfidf (HashingTF + IDF) + StringIndexer to generate the feature vectors. I have included all the stages in a Pipeline {code:java} regexTokenizer = RegexTokenizer(gaps=False, inputCol= raw_data_col, outputCol= "words", pattern="[a-zA-Z_]+", toLowercase=True, minTokenLength=minimum_token_size) hashingTF = HashingTF(inputCol="words", outputCol="rawFeatures", numFeatures=number_of_feature) idf = IDF(inputCol="rawFeatures", outputCol= feature_vec_col) indexer = StringIndexer(inputCol= label_col_name, outputCol= label_vec_name) converter = IndexToString(inputCol='prediction', outputCol="original_label", labels=indexer.fit(df).labels) feature_pipeline = Pipeline(stages=[regexTokenizer, hashingTF, idf, indexer]) estimator = RandomForestClassifier(labelCol=label_col, featuresCol=features_col, numTrees=100) pipeline = Pipeline(stages=[feature_pipeline, estimator, converter]) model = pipeline.fit(df) {code} Generating the feature importances as {code:java} rdc = model.stages[-2] print (rdc.featureImportances) {code} So far so good, but when i try to map the feature importances to the feature columns as below {code:java} attrs = sorted((attr["idx"], attr["name"]) for attr in (chain(*df_pred.schema["featurescol"].metadata["ml_attr"]["attrs"].values( [(name, rdc.featureImportances[idx]) for idx, name in attrs if dtModel_1.featureImportances[idx]]{code} I get the key error on ml_attr {code:java} KeyError: 'ml_attr'{code} The printed the dictionary, {code:java} print (df_pred.schema["featurescol"].metadata){code} and it's empty {} Any thoughts on what I am doing wrong ? How can I getting feature importances to the columns names. Thanks was: I am trying to plot the feature importances of random forest classifier with with column names. I am using Spark 2.3.2 and Pyspark. The input X is sentences and i am using tfidf (HashingTF + IDF) + StringIndexer to generate the feature vectors. I have included all the stages in a Pipeline {{regexTokenizer = RegexTokenizer(gaps=False, inputCol= raw_data_col, outputCol= "words", pattern="[a-zA-Z_]+", toLowercase=True, minTokenLength=minimum_token_size) hashingTF = HashingTF(inputCol="words", outputCol="rawFeatures", numFeatures=number_of_feature) idf = IDF(inputCol="rawFeatures", outputCol= feature_vec_col) indexer = StringIndexer(inputCol= label_col_name, outputCol= label_vec_name) converter = IndexToString(inputCol='prediction', outputCol="original_label", labels=indexer.fit(df).labels) feature_pipeline = Pipeline(stages=[regexTokenizer, hashingTF, idf, indexer]) estimator = RandomForestClassifier(labelCol=label_col, featuresCol=features_col, numTrees=100) pipeline = Pipeline(stages=[feature_pipeline, estimator, converter]) model = pipeline.fit(df)}}{{}} Generating the feature importances as {code:java} rdc = model.stages[-2] print (rdc.featureImportances) {code} So far so good, but when i try to map the feature importances to the feature columns as below {code:java} attrs = sorted((attr["idx"], attr["name"]) for attr in (chain(*df_pred.schema["featurescol"].metadata["ml_attr"]["attrs"].values( [(name, rdc.featureImportances[idx]) for idx, name in attrs if dtModel_1.featureImportances[idx]]{code} I get the key error on ml_attr {code:java} KeyError: 'ml_attr'{code} The printed the dictionary, {code:java} print (df_pred.schema["featurescol"].metadata){code} and it's empty {} Any thoughts on what I am doing wrong ? How can I getting feature importances to the columns names. Thanks > Pyspark random forest classifier feature importance with column names > - > > Key: SPARK-26738 > URL: https://issues.apache.org/jira/browse/SPARK-26738 > Project: Spark > Issue Type: Question > Components: ML >Affects Versions: 2.3.2 > Environment: {code:java} > {code} >Reporter: Praveen >Priority: Major > Labels: RandomForest, pyspark > > I am trying to plot the feature importances of random forest classifier with > with column names. I am using Spark 2.3.2 and Pyspark. > The input X is sentences and i am using tfidf (HashingTF + IDF) + > StringIndexer to generate the feature vectors. > I have included all the stages in a Pipeline > > {code:java} > regexTokenizer = RegexTokenizer(gaps=False, inputCol= raw_data_col, > outputCol= "words", pattern="[a-zA-Z_]+", toLowercase=True, > minTokenLength=minimum_token_size) > hashingTF = HashingTF(inputCol="words",
[jira] [Created] (SPARK-26738) Pyspark random forest classifier feature importance with column names
Praveen created SPARK-26738: --- Summary: Pyspark random forest classifier feature importance with column names Key: SPARK-26738 URL: https://issues.apache.org/jira/browse/SPARK-26738 Project: Spark Issue Type: Question Components: ML Affects Versions: 2.3.2 Environment: {code:java} {code} Reporter: Praveen I am trying to plot the feature importances of random forest classifier with with column names. I am using Spark 2.3.2 and Pyspark. The input X is sentences and i am using tfidf (HashingTF + IDF) + StringIndexer to generate the feature vectors. I have included all the stages in a Pipeline {{regexTokenizer = RegexTokenizer(gaps=False, inputCol= raw_data_col, outputCol= "words", pattern="[a-zA-Z_]+", toLowercase=True, minTokenLength=minimum_token_size) hashingTF = HashingTF(inputCol="words", outputCol="rawFeatures", numFeatures=number_of_feature) idf = IDF(inputCol="rawFeatures", outputCol= feature_vec_col) indexer = StringIndexer(inputCol= label_col_name, outputCol= label_vec_name) converter = IndexToString(inputCol='prediction', outputCol="original_label", labels=indexer.fit(df).labels) feature_pipeline = Pipeline(stages=[regexTokenizer, hashingTF, idf, indexer]) estimator = RandomForestClassifier(labelCol=label_col, featuresCol=features_col, numTrees=100) pipeline = Pipeline(stages=[feature_pipeline, estimator, converter]) model = pipeline.fit(df)}}{{}} Generating the feature importances as {code:java} rdc = model.stages[-2] print (rdc.featureImportances) {code} So far so good, but when i try to map the feature importances to the feature columns as below {code:java} attrs = sorted((attr["idx"], attr["name"]) for attr in (chain(*df_pred.schema["featurescol"].metadata["ml_attr"]["attrs"].values( [(name, rdc.featureImportances[idx]) for idx, name in attrs if dtModel_1.featureImportances[idx]]{code} I get the key error on ml_attr {code:java} KeyError: 'ml_attr'{code} The printed the dictionary, {code:java} print (df_pred.schema["featurescol"].metadata){code} and it's empty {} Any thoughts on what I am doing wrong ? How can I getting feature importances to the columns names. Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23768) Proxy configuration for extraJavaOptions in defaults conf
Praveen created SPARK-23768: --- Summary: Proxy configuration for extraJavaOptions in defaults conf Key: SPARK-23768 URL: https://issues.apache.org/jira/browse/SPARK-23768 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 2.2.1 Environment: default conf setting: spark.executor.extraJavaOptions -Dhttp.proxyHost=IP -Dhttp.proxyPort=8080 -Dhttps.proxyHost=IP -Dhttps.proxyPort=8080 spark.jars.packages datastax:spark-cassandra-connector:2.0.0-M2-s_2.11 Reporter: Praveen When trying to launch spark shell or pyspark for the first time with cassandra connector configured as package, the proxy setting in default conf is not being used and the download fails (if behind proxy). If the proxy is configured directly in the command line, the download happens and spark shell starts correctly. Seems the proxy configuration in default conf is not being used for package download. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org