[jira] [Created] (SPARK-10774) Put different event log to different directory according to some conditions

yangping wu (JIRA) Wed, 23 Sep 2015 04:49:38 -0700

yangping wu created SPARK-10774:
-----------------------------------

             Summary: Put different event log to different directory according 
to some conditions
                 Key: SPARK-10774
                 URL: https://issues.apache.org/jira/browse/SPARK-10774
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.4.1
            Reporter: yangping wu
            Priority: Minor



Right now, Spark logging all event logs(inprogress or finished)  into the some 
directory(configuration by the **spark.eventLog.dir** parameter) as following:
{noformat}
[yangping...@l-sparkcluster.data.cn5 /]$ sudo hadoop fs -ls /spark-jobs/eventLog
Found 58 items
-rwxrwxrwx   3 spark aaa        8438 2015-09-17 15:14 
/spark-jobs/eventLog/application_1440152921247_0047_1.lz4
-rwxrwxrwx   3 spark aaa       44002 2015-09-17 15:15 
/spark-jobs/eventLog/application_1440152921247_0190_1
-rwxrwxrwx   3 spark aaa       44696 2015-09-17 15:15 
/spark-jobs/eventLog/application_1440152921247_0190_2
-rwxrwxrwx   3 spark aaa       40813 2015-09-17 15:25 
/spark-jobs/eventLog/application_1440152921247_0191_1
-rwxrwxrwx   3 spark aaa       44680 2015-09-17 15:25 
/spark-jobs/eventLog/application_1440152921247_0191_2
-rwxrwxrwx   3 spark aaa       42572 2015-09-17 15:36 
/spark-jobs/eventLog/application_1440152921247_0192_1
-rwxrwxrwx   3 spark aaa       44680 2015-09-17 15:36 
/spark-jobs/eventLog/application_1440152921247_0192_2
-rwxrwxrwx   3 spark aaa       45052 2015-09-17 16:09 
/spark-jobs/eventLog/application_1440152921247_0193_1
-rwxrwxrwx   3 spark aaa       44688 2015-09-17 16:09 
/spark-jobs/eventLog/application_1440152921247_0193_2
-rwxrwxrwx   3 spark aaa       41686 2015-09-17 16:11 
/spark-jobs/eventLog/application_1440152921247_0194_1
-rwxrwxrwx   3 spark aaa       44522 2015-09-17 16:11 
/spark-jobs/eventLog/application_1440152921247_0194_2
-rwxrwxrwx   3 spark aaa       32261 2015-09-17 16:13 
/spark-jobs/eventLog/application_1440152921247_0195_1
-rwxrwxrwx   3 spark aaa       31178 2015-09-17 16:13 
/spark-jobs/eventLog/application_1440152921247_0195_2
-rwxrwxrwx   3 spark aaa 39124467712 2015-09-18 11:58 
/spark-jobs/eventLog/application_1440152921247_0205_1.inprogress
-rwxrwxrwx   3 spark aaa   790045092 2015-09-18 20:40 
/spark-jobs/eventLog/application_1440152921247_0206
........
{noformat}

As time goes by, there will be a lot of event log in the **spark.eventLog.dir** 
directory and not easy to manage.  In hadoop, there  are two types of directory 
to save different type event logs: done-dir and intermediate-done-dir, 
configuration by **mapreduce.jobhistory.done-dir** and 
**mapreduce.jobhistory.intermediate-done-dir** respectively. and in the 
"done-dir", event logs were save to different  directory  according to the 
running time of the job as following:
{noformat}
[yangping...@l-sparkcluster.data.cn5 /]$sudo hadoop fs -ls  
/hadoop-jobs/done/2015/09/
Found 23 items
drwxrwxrwx   - hadoop supergroup    0 2015-09-04 16:59 
/hadoop-jobs/done/2015/09/01
drwxrwxrwx   - hadoop supergroup    0 2015-09-05 16:59 
/hadoop-jobs/done/2015/09/02
drwxrwxrwx   - hadoop supergroup    0 2015-09-06 16:59 
/hadoop-jobs/done/2015/09/03
drwxrwxrwx   - hadoop supergroup    0 2015-09-07 16:59 
/hadoop-jobs/done/2015/09/04
drwxrwxrwx   - hadoop supergroup    0 2015-09-08 16:59 
/hadoop-jobs/done/2015/09/05
drwxrwxrwx   - hadoop supergroup    0 2015-09-09 16:59 
/hadoop-jobs/done/2015/09/06
drwxrwxrwx   - hadoop supergroup    0 2015-09-10 16:59 
/hadoop-jobs/done/2015/09/07
drwxrwxrwx   - hadoop supergroup    0 2015-09-11 16:59 
/hadoop-jobs/done/2015/09/08
drwxrwxrwx   - hadoop supergroup    0 2015-09-12 16:59 
/hadoop-jobs/done/2015/09/09
drwxrwxrwx   - hadoop supergroup    0 2015-09-13 16:59 
/hadoop-jobs/done/2015/09/10
drwxrwx---   - hadoop supergroup    0 2015-09-14 16:59 
/hadoop-jobs/done/2015/09/11
drwxrwx---   - hadoop supergroup    0 2015-09-15 16:59 
/hadoop-jobs/done/2015/09/12
drwxrwxrwx   - hadoop supergroup    0 2015-09-16 16:59 
/hadoop-jobs/done/2015/09/13
drwxrwxrwx   - hadoop supergroup    0 2015-09-17 16:59 
/hadoop-jobs/done/2015/09/14
drwxrwxrwx   - hadoop supergroup    0 2015-09-18 16:59 
/hadoop-jobs/done/2015/09/15
drwxrwxrwx   - hadoop supergroup    0 2015-09-19 16:59 
/hadoop-jobs/done/2015/09/16
drwxrwxrwx   - hadoop supergroup    0 2015-09-20 16:59 
/hadoop-jobs/done/2015/09/17
drwxrwx---   - hadoop supergroup    0 2015-09-21 16:59 
/hadoop-jobs/done/2015/09/18
drwxrwx---   - hadoop supergroup    0 2015-09-22 16:59 
/hadoop-jobs/done/2015/09/19
drwxrwx---   - hadoop supergroup    0 2015-09-23 16:59 
/hadoop-jobs/done/2015/09/20
drwxrwx---   - hadoop supergroup    0 2015-09-23 16:59 
/hadoop-jobs/done/2015/09/21
drwxrwx---   - hadoop supergroup    0 2015-09-22 23:43 
/hadoop-jobs/done/2015/09/22
drwxrwx---   - hadoop supergroup    0 2015-09-23 18:55 
/hadoop-jobs/done/2015/09/23
{noformat}

In Spark, I think we can do the same thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10774) Put different event log to different directory according to some conditions

Reply via email to