[jira] [Comment Edited] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package

2019-04-19 Thread Praveen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821639#comment-16821639
 ] 

Praveen edited comment on SPARK-27465 at 4/19/19 10:16 AM:
---

Hi Shahid,

Can you please let me know if you have any update on this issue?

Seems the Error is coming because the Jar kafka_2.11-0.11.0.0.jar is not having 
the Time Class in it. Do you suggest how can we get it?


was (Author: jastipraveen):
Hi Shahid,

Can you please let me know if you have any update on this issue?

Seems the Error is coming because the Jar kafka_2.11-0.11.0.0.jar i not having 
the Time Class in it. Do you suggest how can we get it?

> Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
> --
>
> Key: SPARK-27465
> URL: https://issues.apache.org/jira/browse/SPARK-27465
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>Reporter: Praveen
>Priority: Critical
>
> Hi Team,
> We are getting the below exceptions with Kafka Client Version 0.11.0.0 for 
> KafkaTestUtils Package. But its working fine when we use the Kafka Client 
> Version 0.10.0.1. Please suggest the way forwards. We are using the package "
> org.apache.spark.streaming.kafka010.KafkaTestUtils;"
> And the Spark Streaming Version is 2.2.3 and above.
>  
> ERROR:
> java.lang.NoSuchMethodError: 
> kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time;
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>  at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122)
>  at 
> com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203)
>  at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157)
>  at 
> com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package

2019-04-19 Thread Praveen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821639#comment-16821639
 ] 

Praveen edited comment on SPARK-27465 at 4/19/19 9:39 AM:
--

Hi Shahid,

Can you please let me know if you have any update on this issue?

Seems the Error is coming because the Jar kafka_2.11-0.11.0.0.jar i not having 
the Time Class in it. Do you suggest how can we get it?


was (Author: jastipraveen):
Hi Shahid,

Can you please let me know if you have any update on this issue?

> Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
> --
>
> Key: SPARK-27465
> URL: https://issues.apache.org/jira/browse/SPARK-27465
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>Reporter: Praveen
>Priority: Critical
>
> Hi Team,
> We are getting the below exceptions with Kafka Client Version 0.11.0.0 for 
> KafkaTestUtils Package. But its working fine when we use the Kafka Client 
> Version 0.10.0.1. Please suggest the way forwards. We are using the package "
> org.apache.spark.streaming.kafka010.KafkaTestUtils;"
> And the Spark Streaming Version is 2.2.3 and above.
>  
> ERROR:
> java.lang.NoSuchMethodError: 
> kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time;
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>  at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122)
>  at 
> com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203)
>  at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157)
>  at 
> com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package

2019-04-18 Thread Praveen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821639#comment-16821639
 ] 

Praveen commented on SPARK-27465:
-

Hi Shahid,

Can you please let me know if you have any update on this issue?

> Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
> --
>
> Key: SPARK-27465
> URL: https://issues.apache.org/jira/browse/SPARK-27465
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>Reporter: Praveen
>Priority: Critical
>
> Hi Team,
> We are getting the below exceptions with Kafka Client Version 0.11.0.0 for 
> KafkaTestUtils Package. But its working fine when we use the Kafka Client 
> Version 0.10.0.1. Please suggest the way forwards. We are using the package "
> org.apache.spark.streaming.kafka010.KafkaTestUtils;"
> And the Spark Streaming Version is 2.2.3 and above.
>  
> ERROR:
> java.lang.NoSuchMethodError: 
> kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time;
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>  at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122)
>  at 
> com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203)
>  at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157)
>  at 
> com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package

2019-04-16 Thread Praveen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen updated SPARK-27465:

Description: 
Hi Team,

We are getting the below exceptions with Kafka Client Version 0.11.0.0 for 
KafkaTestUtils Package. But its working fine when we use the Kafka Client 
Version 0.10.0.1. Please suggest the way forwards. We are using the package "

org.apache.spark.streaming.kafka010.KafkaTestUtils;"

And the Spark Streaming Version is 2.2.3 and above.

 

ERROR:

java.lang.NoSuchMethodError: 
kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time;
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110)
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107)
 at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
 at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226)
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107)
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122)
 at 
com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203)
 at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157)
 at 
com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58)

  was:
Hi Team,

We are getting the below exceptions with Kafka Client Version 0.11.0.0 for 
KafkaTestUtils Package. But its working fine when we use the Kafka Client 
Version 0.10.0.1. Please suggest the way forwards. We are using the package "

import org.apache.spark.streaming.kafka010.KafkaTestUtils;"

 

ERROR:

java.lang.NoSuchMethodError: 
kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time;
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110)
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107)
 at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
 at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226)
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107)
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122)
 at 
com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203)
 at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157)
 at 
com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58)


> Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
> --
>
> Key: SPARK-27465
> URL: https://issues.apache.org/jira/browse/SPARK-27465
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>Reporter: Praveen
>Priority: Critical
>
> Hi Team,
> We are getting the below exceptions with Kafka Client Version 0.11.0.0 for 
> KafkaTestUtils Package. But its working fine when we use the Kafka Client 
> Version 0.10.0.1. Please suggest the way forwards. We are using the package "
> org.apache.spark.streaming.kafka010.KafkaTestUtils;"
> And the Spark Streaming Version is 2.2.3 and above.
>  
> ERROR:
> java.lang.NoSuchMethodError: 
> kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time;
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>  at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122)
>  at 
> com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203)
>  at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157)
>  at 
> com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package

2019-04-15 Thread Praveen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen updated SPARK-27465:

Affects Version/s: 2.3.0
   2.3.1
   2.3.2
   2.4.0
   2.4.1

> Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
> --
>
> Key: SPARK-27465
> URL: https://issues.apache.org/jira/browse/SPARK-27465
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>Reporter: Praveen
>Priority: Critical
>
> Hi Team,
> We are getting the below exceptions with Kafka Client Version 0.11.0.0 for 
> KafkaTestUtils Package. But its working fine when we use the Kafka Client 
> Version 0.10.0.1. Please suggest the way forwards. We are using the package "
> import org.apache.spark.streaming.kafka010.KafkaTestUtils;"
>  
> ERROR:
> java.lang.NoSuchMethodError: 
> kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time;
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>  at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122)
>  at 
> com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203)
>  at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157)
>  at 
> com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package

2019-04-15 Thread Praveen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen updated SPARK-27465:

Priority: Critical  (was: Major)

> Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
> --
>
> Key: SPARK-27465
> URL: https://issues.apache.org/jira/browse/SPARK-27465
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.3
>Reporter: Praveen
>Priority: Critical
>
> Hi Team,
> We are getting the below exceptions with Kafka Client Version 0.11.0.0 for 
> KafkaTestUtils Package. But its working fine when we use the Kafka Client 
> Version 0.10.0.1. Please suggest the way forwards. We are using the package "
> import org.apache.spark.streaming.kafka010.KafkaTestUtils;"
>  
> ERROR:
> java.lang.NoSuchMethodError: 
> kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time;
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>  at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122)
>  at 
> com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203)
>  at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157)
>  at 
> com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package

2019-04-15 Thread Praveen (JIRA)
Praveen created SPARK-27465:
---

 Summary: Kafka Client 0.11.0.0 is not Supporting the 
kafkatestutils package
 Key: SPARK-27465
 URL: https://issues.apache.org/jira/browse/SPARK-27465
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 2.3.3
Reporter: Praveen


Hi Team,

We are getting the below exceptions with Kafka Client Version 0.11.0.0 for 
KafkaTestUtils Package. But its working fine when we use the Kafka Client 
Version 0.10.0.1. Please suggest the way forwards. We are using the package "

import org.apache.spark.streaming.kafka010.KafkaTestUtils;"

 

ERROR:

java.lang.NoSuchMethodError: 
kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time;
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110)
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107)
 at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
 at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226)
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107)
 at 
org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122)
 at 
com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203)
 at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157)
 at 
com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26738) Pyspark random forest classifier feature importance with column names

2019-01-26 Thread Praveen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen updated SPARK-26738:

Description: 
I am trying to plot the feature importances of random forest classifier with 
with column names. I am using Spark 2.3.2 and Pyspark.

The input X is sentences and i am using tfidf (HashingTF + IDF) + StringIndexer 
to generate the feature vectors.

I have included all the stages in a Pipeline

 
{code:java}
regexTokenizer = RegexTokenizer(gaps=False, inputCol= raw_data_col, outputCol= 
"words", pattern="[a-zA-Z_]+", toLowercase=True, 
minTokenLength=minimum_token_size)
hashingTF = HashingTF(inputCol="words", outputCol="rawFeatures", 
numFeatures=number_of_feature)
idf = IDF(inputCol="rawFeatures", outputCol= feature_vec_col)
indexer = StringIndexer(inputCol= label_col_name, outputCol= label_vec_name)
converter = IndexToString(inputCol='prediction', outputCol="original_label", 
labels=indexer.fit(df).labels)
feature_pipeline = Pipeline(stages=[regexTokenizer, hashingTF, idf, indexer])
estimator = RandomForestClassifier(labelCol=label_col, 
featuresCol=features_col, numTrees=100)
pipeline = Pipeline(stages=[feature_pipeline, estimator, converter])
model = pipeline.fit(df)
{code}
Generating the feature importances as
{code:java}
rdc = model.stages[-2]
print (rdc.featureImportances)
{code}
So far so good, but when i try to map the feature importances to the feature 
columns as below
{code:java}
attrs = sorted((attr["idx"], attr["name"]) for attr in 
(chain(*df_pred.schema["featurescol"].metadata["ml_attr"]["attrs"].values(

[(name, rdc.featureImportances[idx])
   for idx, name in attrs
   if dtModel_1.featureImportances[idx]]{code}
 

I get the key error on ml_attr
{code:java}
KeyError: 'ml_attr'{code}
The printed the dictionary,
{code:java}
print (df_pred.schema["featurescol"].metadata){code}
and it's empty {}

Any thoughts on what I am doing wrong ? How can I getting feature importances 
to the columns names.

Thanks

  was:
I am trying to plot the feature importances of random forest classifier with 
with column names. I am using Spark 2.3.2 and Pyspark.

The input X is sentences and i am using tfidf (HashingTF + IDF) + StringIndexer 
to generate the feature vectors.

I have included all the stages in a Pipeline

 

 

{{regexTokenizer = RegexTokenizer(gaps=False, inputCol= raw_data_col, 
outputCol= "words", pattern="[a-zA-Z_]+", toLowercase=True, 
minTokenLength=minimum_token_size) 

hashingTF = HashingTF(inputCol="words", outputCol="rawFeatures", 
numFeatures=number_of_feature) 

idf = IDF(inputCol="rawFeatures", outputCol= feature_vec_col) 

indexer = StringIndexer(inputCol= label_col_name, outputCol= label_vec_name) 

converter = IndexToString(inputCol='prediction', outputCol="original_label", 
labels=indexer.fit(df).labels) 

feature_pipeline = Pipeline(stages=[regexTokenizer, hashingTF, idf, indexer]) 

estimator = RandomForestClassifier(labelCol=label_col, 
featuresCol=features_col, numTrees=100) 

pipeline = Pipeline(stages=[feature_pipeline, estimator, converter])

model = pipeline.fit(df)}}{{}}

 

 

Generating the feature importances as

 
{code:java}
rdc = model.stages[-2]
print (rdc.featureImportances)
{code}
So far so good, but when i try to map the feature importances to the feature 
columns as below
{code:java}
attrs = sorted((attr["idx"], attr["name"]) for attr in 
(chain(*df_pred.schema["featurescol"].metadata["ml_attr"]["attrs"].values(

[(name, rdc.featureImportances[idx])
   for idx, name in attrs
   if dtModel_1.featureImportances[idx]]{code}
 

I get the key error on ml_attr
{code:java}
KeyError: 'ml_attr'{code}
The printed the dictionary,
{code:java}
print (df_pred.schema["featurescol"].metadata){code}
and it's empty {}

Any thoughts on what I am doing wrong ? How can I getting feature importances 
to the columns names.

Thanks


> Pyspark random forest classifier feature importance with column names
> -
>
> Key: SPARK-26738
> URL: https://issues.apache.org/jira/browse/SPARK-26738
> Project: Spark
>  Issue Type: Question
>  Components: ML
>Affects Versions: 2.3.2
> Environment: {code:java}
>  {code}
>Reporter: Praveen
>Priority: Major
>  Labels: RandomForest, pyspark
>
> I am trying to plot the feature importances of random forest classifier with 
> with column names. I am using Spark 2.3.2 and Pyspark.
> The input X is sentences and i am using tfidf (HashingTF + IDF) + 
> StringIndexer to generate the feature vectors.
> I have included all the stages in a Pipeline
>  
> {code:java}
> regexTokenizer = RegexTokenizer(gaps=False, inputCol= raw_data_col, 
> outputCol= "words", pattern="[a-zA-Z_]+", toLowercase=True, 
> minTokenLength=minimum_token_size)
> hashingTF = HashingTF(inputCol="words", 

[jira] [Created] (SPARK-26738) Pyspark random forest classifier feature importance with column names

2019-01-26 Thread Praveen (JIRA)
Praveen created SPARK-26738:
---

 Summary: Pyspark random forest classifier feature importance with 
column names
 Key: SPARK-26738
 URL: https://issues.apache.org/jira/browse/SPARK-26738
 Project: Spark
  Issue Type: Question
  Components: ML
Affects Versions: 2.3.2
 Environment: {code:java}
 {code}
Reporter: Praveen


I am trying to plot the feature importances of random forest classifier with 
with column names. I am using Spark 2.3.2 and Pyspark.

The input X is sentences and i am using tfidf (HashingTF + IDF) + StringIndexer 
to generate the feature vectors.

I have included all the stages in a Pipeline

 

 

{{regexTokenizer = RegexTokenizer(gaps=False, inputCol= raw_data_col, 
outputCol= "words", pattern="[a-zA-Z_]+", toLowercase=True, 
minTokenLength=minimum_token_size) 

hashingTF = HashingTF(inputCol="words", outputCol="rawFeatures", 
numFeatures=number_of_feature) 

idf = IDF(inputCol="rawFeatures", outputCol= feature_vec_col) 

indexer = StringIndexer(inputCol= label_col_name, outputCol= label_vec_name) 

converter = IndexToString(inputCol='prediction', outputCol="original_label", 
labels=indexer.fit(df).labels) 

feature_pipeline = Pipeline(stages=[regexTokenizer, hashingTF, idf, indexer]) 

estimator = RandomForestClassifier(labelCol=label_col, 
featuresCol=features_col, numTrees=100) 

pipeline = Pipeline(stages=[feature_pipeline, estimator, converter])

model = pipeline.fit(df)}}{{}}

 

 

Generating the feature importances as

 
{code:java}
rdc = model.stages[-2]
print (rdc.featureImportances)
{code}
So far so good, but when i try to map the feature importances to the feature 
columns as below
{code:java}
attrs = sorted((attr["idx"], attr["name"]) for attr in 
(chain(*df_pred.schema["featurescol"].metadata["ml_attr"]["attrs"].values(

[(name, rdc.featureImportances[idx])
   for idx, name in attrs
   if dtModel_1.featureImportances[idx]]{code}
 

I get the key error on ml_attr
{code:java}
KeyError: 'ml_attr'{code}
The printed the dictionary,
{code:java}
print (df_pred.schema["featurescol"].metadata){code}
and it's empty {}

Any thoughts on what I am doing wrong ? How can I getting feature importances 
to the columns names.

Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23768) Proxy configuration for extraJavaOptions in defaults conf

2018-03-22 Thread Praveen (JIRA)
Praveen created SPARK-23768:
---

 Summary: Proxy configuration for extraJavaOptions in defaults conf
 Key: SPARK-23768
 URL: https://issues.apache.org/jira/browse/SPARK-23768
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 2.2.1
 Environment: default conf setting:

spark.executor.extraJavaOptions    -Dhttp.proxyHost=IP -Dhttp.proxyPort=8080 
-Dhttps.proxyHost=IP -Dhttps.proxyPort=8080

spark.jars.packages                
datastax:spark-cassandra-connector:2.0.0-M2-s_2.11

 
Reporter: Praveen


When trying to launch spark shell or pyspark for the first time with cassandra 
connector configured as package, the proxy setting in default conf is not being 
used and the download fails (if behind proxy).

If the proxy is configured directly in the command line, the download happens 
and spark shell starts correctly. 

Seems the proxy configuration in default conf is not being used for package 
download.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org