subject:"\[jira\] \[Updated\] \(HBASE\-13702\) ImportTsv\: Add dry\-run functionality and log bad rows"

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-07-04 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13702:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13702-branch-1-v2.patch, 
 HBASE-13702-branch-1-v3.patch, HBASE-13702-branch-1.patch, 
 HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, 
 HBASE-13702-v5.patch, HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-07-02 Thread Apekshit Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apekshit Sharma updated HBASE-13702:

Attachment: HBASE-13702-branch-1-v3.patch

[~tedyu] fixed the test. So one of the test was failing because an existing
test was directly changing global configuration (util.getConfiguration()) in
its test body which affected any tests that ran later. :-/

ImportTsv: Add dry-run functionality and log bad rows
-

Key: HBASE-13702
URL: https://issues.apache.org/jira/browse/HBASE-13702
Project: HBase
Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
Fix For: 2.0.0, 1.3.0

Attachments: HBASE-13702-branch-1-v2.patch,
HBASE-13702-branch-1-v3.patch, HBASE-13702-branch-1.patch,
HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch,
HBASE-13702-v5.patch, HBASE-13702.patch

ImportTSV job skips bad records by default (keeps a count though).
-Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is
encountered.
To be easily able to determine which rows are corrupted in an input, rather
than failing on one row at a time seems like a good feature to have.
Moreover, there should be 'dry-run' functionality in such kinds of tools,
which can essentially does a quick run of tool without making any changes but
reporting any errors/warnings and success/failure.
To identify corrupted rows, simply logging them should be enough. In worst
case, all rows will be logged and size of logs will be same as input size,
which seems fine. However, user might have to do some work figuring out where
the logs. Is there some link we can show to the user when the tool starts
which can help them with that?
For the dry run, we can simply use if-else to skip over writing out KVs, and
any other mutations, if present.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-30 Thread Apekshit Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apekshit Sharma updated HBASE-13702:

Attachment: HBASE-13702-branch-1.patch

[~tedyu] Sorry for the late response. PFA patch for branch-1.

ImportTsv: Add dry-run functionality and log bad rows
-

Key: HBASE-13702
URL: https://issues.apache.org/jira/browse/HBASE-13702
Project: HBase
Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
Fix For: 2.0.0, 1.3.0

Attachments: HBASE-13702-branch-1.patch, HBASE-13702-v2.patch,
HBASE-13702-v3.patch, HBASE-13702-v4.patch, HBASE-13702-v5.patch,
HBASE-13702.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-30 Thread Apekshit Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-13702:

Attachment: HBASE-13702-branch-1-v2.patch

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13702-branch-1-v2.patch, 
 HBASE-13702-branch-1.patch, HBASE-13702-v2.patch, HBASE-13702-v3.patch, 
 HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-25 Thread Apekshit Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apekshit Sharma updated HBASE-13702:

Attachment: HBASE-13702-v5.patch

[~tedyu] you're right. fixed the issues.

ImportTsv: Add dry-run functionality and log bad rows
-

Key: HBASE-13702
URL: https://issues.apache.org/jira/browse/HBASE-13702
Project: HBase
Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch,
HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-25 Thread Apekshit Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apekshit Sharma updated HBASE-13702:

Status: Patch Available (was: Open)

ImportTsv: Add dry-run functionality and log bad rows
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-25 Thread Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Yu updated HBASE-13702:
---
Status: Open (was: Patch Available)

ImportTsv: Add dry-run functionality and log bad rows
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-25 Thread Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Yu updated HBASE-13702:
---
Fix Version/s: 1.3.0
2.0.0

ImportTsv: Add dry-run functionality and log bad rows
-

Key: HBASE-13702
URL: https://issues.apache.org/jira/browse/HBASE-13702
Project: HBase
Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
Fix For: 2.0.0, 1.3.0

Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch,
HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-22 Thread Apekshit Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apekshit Sharma updated HBASE-13702:

Attachment: HBASE-13702-v2.patch

v2 patch.
fixed some naming issues and merge conflicts.

ImportTsv: Add dry-run functionality and log bad rows
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-22 Thread Apekshit Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apekshit Sharma updated HBASE-13702:

Attachment: HBASE-13702-v3.patch

ImportTsv: Add dry-run functionality and log bad rows
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-22 Thread Apekshit Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apekshit Sharma updated HBASE-13702:

Attachment: HBASE-13702-v4.patch

ImportTsv: Add dry-run functionality and log bad rows
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-05-26 Thread Srikanth Srungarapu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Srungarapu updated HBASE-13702:

Assignee: Apekshit Sharma

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Attachments: HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-05-26 Thread Apekshit Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-13702:

Attachment: HBASE-13702.patch

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
 Attachments: HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-05-26 Thread Apekshit Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-13702:

Status: Patch Available  (was: Open)

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
 Attachments: HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-05-15 Thread Apekshit Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apekshit Sharma updated HBASE-13702:

Description:
ImportTSV job skips bad records by default (keeps a count though).
-Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is
encountered.
To be easily able to determine which rows are corrupted in an input, rather
than failing on one row at a time seems like a good feature to have.
Moreover, there should be 'dry-run' functionality in such kinds of tools, which
can essentially does a quick run of tool without making any changes but
reporting any errors/warnings and success/failure.

To identify corrupted rows, simply logging them should be enough. In worst
case, all rows will be logged and size of logs will be same as input size,
which seems fine. However, user might have to do some work figuring out where
the logs. Is there some link we can show to the user when the tool starts which
can help them with that?

For the dry run, we can simply use if-else to skip over creating table, writing
out KVs, and other mutations.

was:
ImportTSV job skips bad records by default (keeps a count though).
-Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is
encountered.
To be easily able to determine which rows are corrupted in an input, rather
than failing on one row at a time seems like a good feature to have.
Moreover, there should be 'dry-run' functionality in such kinds of tools, which
can essentially does a quick run of tool without making any changes but
reporting any errors/warnings and success/failure.

To identify corrupted rows, simply logging them should be enough. In worst
case, all rows will be logged and size of logs will be same as input size,
which seems fine. However, user might have to do some work figuring out where
the logs. If there some link we can show the user in the starting which can
help them with that?

For the dry run, we can simply use if-else to skip over creating table, writing
out KVs, etc.

ImportTsv: Add dry-run functionality and log bad rows
-

Key: HBASE-13702
URL: https://issues.apache.org/jira/browse/HBASE-13702
Project: HBase
Issue Type: New Feature
Reporter: Apekshit Sharma

ImportTSV job skips bad records by default (keeps a count though).
-Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is
encountered.
To be easily able to determine which rows are corrupted in an input, rather
than failing on one row at a time seems like a good feature to have.
Moreover, there should be 'dry-run' functionality in such kinds of tools,
which can essentially does a quick run of tool without making any changes but
reporting any errors/warnings and success/failure.
To identify corrupted rows, simply logging them should be enough. In worst
case, all rows will be logged and size of logs will be same as input size,
which seems fine. However, user might have to do some work figuring out where
the logs. Is there some link we can show to the user when the tool starts
which can help them with that?
For the dry run, we can simply use if-else to skip over creating table,
writing out KVs, and other mutations.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-05-15 Thread Apekshit Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apekshit Sharma updated HBASE-13702:

For the dry run, we can simply use if-else to skip over writing out KVs, and
any other mutations, if present.

For the dry run, we can simply use if-else to skip over creating table, writing
out KVs, and other mutations.

ImportTsv: Add dry-run functionality and log bad rows
-

Key: HBASE-13702
URL: https://issues.apache.org/jira/browse/HBASE-13702
Project: HBase
Issue Type: New Feature
Reporter: Apekshit Sharma

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

16 matches

Site Navigation

Mail list logo

Footer information