Spark and Oozie

2019-07-18 Thread Dennis Suhari


Dear experts,

I am using Spark for processing data from HDFS (hadoop). These Spark 
application are data pipelines, data wrangling and machine learning 
applications. Thus Spark submits its job using YARN. 
This also works well. For scheduling I am now trying to use Apache Oozie, but I 
am facing performqnce impacts. A Spark job which tooks 44 seconds when 
submitting it via CLI now takes nearly 3 Minutes.

Have you faced similar experiences in using Oozie for scheduling Spark 
application jobs ? What alternative workflow tools are you using for scheduling 
Spark jobs on Hadoop ?


Br,

Dennis

Von meinem iPhone gesendet

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-07-18 Thread Hyukjin Kwon
Hi all,

Seems this issue is re-happening again. Seems the PR link is properly
created in the corresponding JIRA but it doesn't change the JIRA's status
from OPEN to IN-PROGRESS.

See, for instance,

https://issues.apache.org/jira/browse/SPARK-28443
https://issues.apache.org/jira/browse/SPARK-28440
https://issues.apache.org/jira/browse/SPARK-28436
https://issues.apache.org/jira/browse/SPARK-28434
https://issues.apache.org/jira/browse/SPARK-28433
https://issues.apache.org/jira/browse/SPARK-28431

Josh and Dongjoon, do you guys maybe have any idea?

2019년 4월 25일 (목) 오후 3:09, Hyukjin Kwon 님이 작성:

> Thank you so much Josh .. !!
>
> 2019년 4월 25일 (목) 오후 3:04, Josh Rosen 님이 작성:
>
>> The code for this runs in http://spark-prs.appspot.com (see
>> https://github.com/databricks/spark-pr-dashboard/blob/1e799c9e510fa8cdc9a6c084a777436bebeabe10/sparkprs/controllers/tasks.py#L137
>> )
>>
>> I checked the AppEngine logs and it looks like we're getting error
>> responses, possibly due to a credentials issue:
>>
>> Exception when starting progress on JIRA issue SPARK-27355 (
>>> /base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/controllers/tasks.py:142
>>> )
>>> Traceback (most recent call last): File
>>> Traceback (most recent call last):
>>> File 
>>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/controllers/tasks.py",
>>> line 138
>>> ,
>>> in update_pr start_issue_progress("%s-%s" % (app.config['JIRA_PROJECT'],
>>> issue_number)) File
>>> start_issue_progress("%s-%s" % (app.config['JIRA_PROJECT'],
>>> issue_number))
>>> File 
>>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/jira_api.py",
>>> line 27
>>> ,
>>> in start_issue_progress jira_client = get_jira_client() File
>>> jira_client = get_jira_client()
>>> File 
>>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/jira_api.py",
>>> line 18
>>> ,
>>> in get_jira_client app.config['JIRA_PASSWORD'])) File
>>> app.config['JIRA_PASSWORD']))
>>> File 
>>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/client.py",
>>> line 472
>>> ,
>>> in __init__ si = self.server_info() File
>>> si = self.server_info()
>>> File 
>>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/client.py",
>>> line 2133
>>> ,
>>> in server_info j = self._get_json('serverInfo') File
>>> j = self._get_json('serverInfo')
>>> File 
>>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/client.py",
>>> line 2549
>>> ,
>>> in _get_json r = self._session.get(url, params=params) File
>>> r = self._session.get(url, params=params)
>>> File 
>>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/resilientsession.py",
>>> line 151
>>> 

Using Custom Version of Hive with Spark

2019-07-18 Thread Valeriy Trofimov
Hi All,

I've created test tables in HiveCLI (druid1, druid2) and test tables in
Beeline (beeline1, beeline2).

I want to be able to access Hive tables in Beeline and Beeline tables in
Hive. Is it possible to do?

I've set up hive-site.xml for both Hive and Spark to use the same warehouse
thinking that this should be enough for Hive and Spark to be able to see
the tables, but for some reason Hive only sees tables created in Hive and
Beeline sees only tables created in Beeline - see screenshot.

[image: image.png]

What else can I do to make it work?

Thanks,
Val