[GitHub] spark issue #15722: [SPARK-18208] [Shuffle] Executor OOM due to a memory lea...

2016-11-01 Thread jiexiong
Github user jiexiong commented on the issue:

https://github.com/apache/spark/pull/15722
  
This is production query. Sorry, I could not share it. It is doing a join 
between two big tables. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15722: [SPARK-18208] [Shuffle] Executor OOM due to a memory lea...

2016-11-01 Thread jiexiong
Github user jiexiong commented on the issue:

https://github.com/apache/spark/pull/15722
  
Here is the query:
Here is the query:

INSERT OVERWRITE TABLE 
lookalike_trainer_campaign_conv_users_with_country_shadow
PARTITION(ds='2016-10-19')
SELECT c.source_id,
c.country,
c.user_id,
c.conversion_time
FROM (
SELECT b.source_id,
b.country,
b.user_id,
b.conversion_time,
FB_NUMBER_ROWS(b.country, b.source_id) as rank
FROM (
SELECT source_id,
country,
user_id,
MAX(conversion_time) / 1000 AS conversion_time
FROM (
SELECT v.campaigngroup_id,
v.campaign_id,
v.adgroup_id,
v.user_id,
Y.country,
v.last_conversion_time AS conversion_time
FROM dim_all_users_fast:bi Y
JOIN lookalike_trainer_campaign_conv_raw v
ON  v.user_id = Y.userid
WHERE v.ds='2016-10-19'
AND Y.ds = '2016-10-19'
AND Y.country IS NOT NULL
) a LATERAL VIEW
EXPLODE(ARRAY(campaigngroup_id, campaign_id, adgroup_id)) s
AS source_id
GROUP BY country, source_id, user_id
DISTRIBUTE by country, source_id
SORT BY country, source_id, conversion_time DESC
) b
) c  WHERE rank <= 6

Before the fix, it would fail from OOM error. After the fix, the OOM error 
went away.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15722: [SPARK-18208] [Shuffle] Executor OOM due to a memory lea...

2016-11-01 Thread davies
Github user davies commented on the issue:

https://github.com/apache/spark/pull/15722
  
OK to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15722: [SPARK-18208] [Shuffle] Executor OOM due to a memory lea...

2016-11-01 Thread davies
Github user davies commented on the issue:

https://github.com/apache/spark/pull/15722
  
@jiexiong I don't think this is a memory leak, BytesToBytesMap does not 
release all memory for each spilling based on the assumption that the memory 
will be acquired back soon. What's the query that make you think this is a leak?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15722: [SPARK-18208] [Shuffle] Executor OOM due to a memory lea...

2016-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15722
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org