Connection Reset by Peer : failed to remove cached rdd

2021-07-29 Thread Big data developer need help relat to spark gateway roles in 2 . 0
Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings  : failed to remove cache rdd or failed  to remove broadcast variable. Please help us how to

Connection Reset by Peer : failed to remove cached rdd

2021-07-29 Thread Big data developer need help relat to spark gateway roles in 2 . 0
Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings  : failed to remove cache rdd or failed  to remove broadcast variable. Please help us how to

how to manage HBase connections in Executors of Spark Streaming ?

2020-11-23 Thread big data
Hi, Does any best practices about how to manage Hbase connections with kerberos authentication in Spark Streaming (YARN) environment? Want to now how executors manage the HBase connections,how to create them, close them and refresh Kerberos expires. Thanks.

spark-submit parameters about two keytab files to yarn and kafka

2020-10-28 Thread big data
Hi, We want to submit spark streaming job to YARN and consume Kafka topic. YARN and Kafka are in two different clusters, and they have the different kerberos authentication. We have two keytab files for YARN and Kafka. And my questions is how to add parameter for spark-submit command for

Java Generic T makes ClassNotFoundException

2019-06-27 Thread big data
Dear, I use Spark to deserialize some files to restore to my own Class object. The Spark code and Class deserialized code (using Apache Common Lang) like this: val fis = spark.sparkContext.binaryFiles("/folder/abc*.file") val RDD = fis.map(x => { val content = x._2.toArray() val b =

Re: Array[Byte] from BinaryFiles can not be deserialized on Spark Yarn mode

2019-06-26 Thread big data
ve can run successfully in Spark local mode, but when run it in Yarn cluster mode, the error happens. 在 2019/6/26 下午5:52, big data 写道: I use Apache Commons Lang3's SerializationUtils in the code. SerializationUtils.serialize() to store a customized class as files into disk and Seriali

Re: [Pyspark 2.4] Best way to define activity within different time window

2019-06-09 Thread big data
From m opinion, Bitmap is the best solution for active users calculation. Other solution almost bases on count(distinct) calculation process, which is more slower. If you 've implemented Bitmap solution including how to build Bitmap, how to load Bitmap, then Bitmap is the best choice. 在

How to improve binaryFiles performance?

2019-05-27 Thread big data
Hi all, I've many binary files stored in HDFS, and use SparkContext.binaryFiles to load them into RDD, then transfer them to be calculated. How the limitation is load files, is there any solutions to improve load binary files performance? Thanks.

Maven dependecy problem about spark-streaming-kafka_2.11:1.6.3

2018-12-16 Thread big data
Hi, our project includes this dependency by: org.apache.spark spark-streaming-kafka_2.11 1.6.3 From dependency tree, we can see it dependency kafka_2.11:0.8.2.1 verson. [cid:part1.8B915977.629F799E@outlook.com] But when we move this dependency to parent pom file, the dependency

An exception makes different phenomnon

2018-04-17 Thread big data
> Hi all, > > we have two environments for spark streaming job, which consumes Kafka > topic to do calculation. > > Now in one environment, spark streaming job consume an non-standard > data from kafka and throw an excepiton(not catch it in code), then the > sreaming job is down. > > But in

An exception makes different phenomnon

2018-04-16 Thread big data
Hi all, we have two environments for spark streaming job, which consumes Kafka topic to do calculation. Now in one environment, spark streaming job consume an non-standard data from kafka and throw an excepiton(not catch it in code), then the sreaming job is down. But in another environment,

Spark Application stuck

2018-03-14 Thread Mukund Big Data
Hi I am executing the following recommendation engine using Spark ML https://aws.amazon.com/blogs/big-data/building-a-recommendation-engine-with-spark-ml-on-amazon-emr-using-zeppelin/ When I am trying to save the model, the application hungs and does't respond. Any pointers to find where

Re: How to deal with string column data for spark mlib?

2016-12-20 Thread big data
to transfer sex, country, attr1, attr2 columns' value to double type directly in spark's job. thanks. 在 16/12/20 下午9:37, theodondre 写道: Give a snippets of the data. Sent from my T-Mobile 4G LTE Device Original message From: big data <bigdatab...@outlook.com><mailto

How to deal with string column data for spark mlib?

2016-12-20 Thread big data
our source data are string-based data, like this: col1 col2 col3 ... aaa bbbccc aa2 bb2cc2 aa3 bb3cc3 ... ... ... How to convert all of these data to double to apply for mlib's algorithm? thanks.