[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2015-06-04 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572728#comment-14572728
 ] 

Vinayakumar B commented on HDFS-5952:
-

Any more update on this tool?

Really like to see the tool with new ideas.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
>  Labels: BB2015-05-TBR
> Attachments: HDFS-5952.patch
>
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525030#comment-14525030
 ] 

Hadoop QA commented on HDFS-5952:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12663638/HDFS-5952.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10700/console |


This message was automatically generated.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
> Attachments: HDFS-5952.patch
>
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524971#comment-14524971
 ] 

Hadoop QA commented on HDFS-5952:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12663638/HDFS-5952.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10677/console |


This message was automatically generated.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
> Attachments: HDFS-5952.patch
>
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-10-02 Thread Hao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157571#comment-14157571
 ] 

Hao Chen commented on HDFS-5952:


[~eddyxu] External storage is surely one of the correct options to resolve the 
problem of RAM consumption. Please feel free to take it and move forward.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
> Attachments: HDFS-5952.patch
>
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-10-02 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157309#comment-14157309
 ] 

Haohui Mai commented on HDFS-5952:
--

I've explored the direction of using leveldb in HDFS-6293:

https://issues.apache.org/jira/browse/HDFS-6293?focusedCommentId=13989358&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13989358

Please feel free to take the patch and drive it forward.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
> Attachments: HDFS-5952.patch
>
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-10-02 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157281#comment-14157281
 ] 

Lei (Eddy) Xu commented on HDFS-5952:
-

Hey, [~haoch] and [~wheat9]

Firstly, thank you for the work you have done. 

I've been looking at writing a tool that uses external db (e.g., leveldb) to 
process the new-style protobuf-based fsimage. Using leveldb can remove the RAM 
limitations (i.e., loading all inodes into RAM first). This would be more 
convenient for people who don't want to lose the information in the new image 
(such as xattrs), but who want delimited output. It would be great that I can 
follow  [~haoch]'s work and of course I would love to help to get this patch in.

What do you think about this?

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
> Attachments: HDFS-5952.patch
>
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-10-02 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156866#comment-14156866
 ] 

Haohui Mai commented on HDFS-5952:
--

Users can export legacy oiv to generate the old delimited format. See HDFS-6293.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
> Attachments: HDFS-5952.patch
>
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-10-01 Thread Hao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156092#comment-14156092
 ] 

Hao Chen commented on HDFS-5952:


As to OIV performance related, please refer to: 
https://issues.apache.org/jira/browse/HDFS-6914.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
> Attachments: HDFS-5952.patch
>
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-10-01 Thread Hao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156091#comment-14156091
 ] 

Hao Chen commented on HDFS-5952:


I have tested for large PB-based fsimage about 8GiB which used to consume about 
85GiB of memory and is just taking about 30GiB (about 30% or less) now using 
this processor.

In fact, we are using this processor in production for all our clusters now 
which seems to work fine aside name node without affecting its performance and 
we are highly relying on it for daily hadoop storage management but not just 
for temporary troubleshooting, so I am surely willing to bring it back to trunk 
if it can help others too.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
> Attachments: HDFS-5952.patch
>
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-10-01 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155000#comment-14155000
 ] 

Lei (Eddy) Xu commented on HDFS-5952:
-

Hi, [~haoch].

Thank you very much for your work. Bringing back supports of OIV delimited 
processor will be very helpful. I just have a few small questions.

Are you going to make it into trunk? Moreover, have you tried to process large 
fsimage (e.g., 16gb). Would you mind to share us the results?

It would be appreciated to have this functionality back in trunk. 

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
> Attachments: HDFS-5952.patch
>
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-09-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147683#comment-14147683
 ] 

Hadoop QA commented on HDFS-5952:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663638/HDFS-5952.patch
  against trunk revision dff95f7.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8203//console

This message is automatically generated.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
> Attachments: HDFS-5952.patch
>
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-02-19 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906220#comment-13906220
 ] 

Akira AJISAKA commented on HDFS-5952:
-

Thank you for your comment.
I'm okay to use XML-based tool, and I don't want to duplicate the code.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 3.0.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-02-18 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904434#comment-13904434
 ] 

Haohui Mai commented on HDFS-5952:
--

Is it okay to use the XML-based tool for debugging? Otherwise you'll end up 
with duplicating the code in {{PBImageXmlWriter}} to parse the fsimage.

Note that the XML / delimited formats are intended to capture all internal 
details of the fsimage. I understand that the delimited format is more compact 
than the XML one. The delimited format does not include a schema thus it could 
be problematic when the format of fsimage changes. Unfortunately we changes the 
fsimage format quite often. :-(

If you really want to output in delimited format, I think it might be easier to 
take the output of {{PBImageXmlWriter}} and to use SAX to convert the XML into 
the delimited format. It should work fairly efficiently.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 3.0.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5952) Create a tool to run data analysis on the PB format fsimage

2014-02-14 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902248#comment-13902248
 ] 

Akira AJISAKA commented on HDFS-5952:
-

Rethinking this idea, it is good for data analysis, but not for 
troubleshooting. It needs too much cost to run Hive/Pig jobs when an cluster is 
in trouble.

Therefore, a tool to dump fsimage into text format is still needed.
The tool will output two text files:
* files/dirs information
* snapshot diffs

and users can analyze namespaces or lsr to snapshots by tools such as SQLite.

> Create a tool to run data analysis on the PB format fsimage
> ---
>
> Key: HDFS-5952
> URL: https://issues.apache.org/jira/browse/HDFS-5952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 3.0.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>
> Delimited processor in OfflineImageViewer is not supported after HDFS-5698 
> was merged.
> The motivation of delimited processor is to run data analysis on the fsimage, 
> therefore, there might be more values to create a tool for Hive or Pig that 
> reads the PB format fsimage directly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)