Question regarding kryo and java encoders in datasets

2019-01-03 Thread Devender Yadav
Hi All, Good day! I am using spark 2.4 and referring https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence Bean class: public class EmployeeBean implements Serializable { private Long id; private String name; private Long salary; private Integer

Re: How to reissue a delegated token after max lifetime passes for a spark streaming application on a Kerberized cluster

2019-01-03 Thread Marcelo Vanzin
Ah, man, there are a few known issues with KMS delegation tokens. The main one we've run into is HADOOP-14445, but it's only fixed in new versions of Hadoop. I wouldn't expect you guys to be running those, but if you are, it would be good to know. In our forks we added a hack to work around that

R: How to reissue a delegated token after max lifetime passes for a spark streaming application on a Kerberized cluster

2019-01-03 Thread Paolo Platter
Hi, The spark default behaviour is to request a brand new token every 24 hours, it is not going to renew delegation tokens, and it is the better approach for long running applications like streaming ones. In our use case using keytab and principal is working fine with

Re: How to reissue a delegated token after max lifetime passes for a spark streaming application on a Kerberized cluster

2019-01-03 Thread Marcelo Vanzin
If you are using the principal / keytab params, Spark should create tokens as needed. If it's not, something else is going wrong, and only looking at full logs for the app would help. On Wed, Jan 2, 2019 at 5:09 PM Ali Nazemian wrote: > > Hi, > > We are using a headless keytab to run our

[Spark cluster standalone v2.4.0] - problems with reverse proxy functionnality regarding submitted applications in cluster mode and the spark history server ui

2019-01-03 Thread Cheikh_SOW
Hello, I have many spark clusters in standalone mode with 3 nodes each. One of them is in HA with 3 masters and 3 workers and everything regarding the HA is working fine. The second one is not in HA mode and we have one master and 3 workers. In both of them, I have configured the reverse proxy

Re: Spark jdbc postgres numeric array

2019-01-03 Thread Takeshi Yamamuro
Hi, I checked that v2.2/v2.3/v2.4/master had the same issue, so can you file a jira? I looked over the related code and then I think we need more logics to handle this issue;

Using Spark as an ETL tool for moving data from Hive tables to BigQuery

2019-01-03 Thread Mich Talebzadeh
Hi, To move data from Hive to Google BigQuery, one needs to create a staging table in Hive in a storage format that can be read in BigQuery. Both AVRO and ORC file format in Hive work but the files cannot be compressed. In addition, to handle both data types and Dounble types, best to convert