[jira] [Updated] (HBASE-10416) Improvements to the import flow

2014-01-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10416:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Improvements to the import flow
 ---

 Key: HBASE-10416
 URL: https://issues.apache.org/jira/browse/HBASE-10416
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Vasu Mariyala
Assignee: Vasu Mariyala
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10416-rev1.patch, HBASE-10416.patch


 Following improvements can be made to the Import logic
 a) Make the import extensible (i.e., remove the filter from being a static 
 member of Import and make it an instance variable of the mapper, make the 
 mappers or variables of interest protected. )
 b) Make sure that the Import calls filterRowKey method of the filter (Useful 
 if we want to filter the data of an organization based on the row key or 
 using filters like PrefixFilter which filter the data in filterRowKey method 
 rather than the filterKeyValue method). The existing test case in 
 TestImportExport#testWithFilter works with this assumption but is so far 
 successful because there is only one row inserted into the table.
 c) Provide an option to specify the durability during the import (Specifying 
 the Durability as SKIP_WAL would improve the performance of restore 
 considerably.) [~lhofhansl] suggested that this should be a parameter to the 
 import.
 d) Some minor refactoring to avoid building a comma separated string for the 
 filter args.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10416) Improvements to the import flow

2014-01-28 Thread Vasu Mariyala (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasu Mariyala updated HBASE-10416:
--

Release Note: 
Import with this fix supports 

a) Filtering of the row using the Filter#filterRowKey(byte[] buffer, int 
offset, int length).

b) Accepts durability parameter (Ex: -Dimport.wal.durability=SKIP_WAL ) while 
importing the data into HBase. If the data doesn't need to be replicated to the 
DR cluster or if the same import job would be run on the dr cluster, consider 
using SKIP_WAL durability for performance.

 Improvements to the import flow
 ---

 Key: HBASE-10416
 URL: https://issues.apache.org/jira/browse/HBASE-10416
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Vasu Mariyala
 Attachments: HBASE-10416.patch


 Following improvements can be made to the Import logic
 a) Make the import extensible (i.e., remove the filter from being a static 
 member of Import and make it an instance variable of the mapper, make the 
 mappers or variables of interest protected. )
 b) Make sure that the Import calls filterRowKey method of the filter (Useful 
 if we want to filter the data of an organization based on the row key or 
 using filters like PrefixFilter which filter the data in filterRowKey method 
 rather than the filterKeyValue method). The existing test case in 
 TestImportExport#testWithFilter works with this assumption but is so far 
 successful because there is only one row inserted into the table.
 c) Provide an option to specify the durability during the import (Specifying 
 the Durability as SKIP_WAL would improve the performance of restore 
 considerably.) [~lhofhansl] suggested that this should be a parameter to the 
 import.
 d) Some minor refactoring to avoid building a comma separated string for the 
 filter args.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10416) Improvements to the import flow

2014-01-28 Thread Vasu Mariyala (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasu Mariyala updated HBASE-10416:
--

Attachment: HBASE-10416-rev1.patch

 Improvements to the import flow
 ---

 Key: HBASE-10416
 URL: https://issues.apache.org/jira/browse/HBASE-10416
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Vasu Mariyala
 Attachments: HBASE-10416-rev1.patch, HBASE-10416.patch


 Following improvements can be made to the Import logic
 a) Make the import extensible (i.e., remove the filter from being a static 
 member of Import and make it an instance variable of the mapper, make the 
 mappers or variables of interest protected. )
 b) Make sure that the Import calls filterRowKey method of the filter (Useful 
 if we want to filter the data of an organization based on the row key or 
 using filters like PrefixFilter which filter the data in filterRowKey method 
 rather than the filterKeyValue method). The existing test case in 
 TestImportExport#testWithFilter works with this assumption but is so far 
 successful because there is only one row inserted into the table.
 c) Provide an option to specify the durability during the import (Specifying 
 the Durability as SKIP_WAL would improve the performance of restore 
 considerably.) [~lhofhansl] suggested that this should be a parameter to the 
 import.
 d) Some minor refactoring to avoid building a comma separated string for the 
 filter args.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10416) Improvements to the import flow

2014-01-28 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10416:
---

Release Note: 
Import with this fix supports 

a) Filtering of the row using the Filter#filterRowKey(byte[] buffer, int 
offset, int length).

b) Accepts durability parameter (Ex: -Dimport.wal.durability=SKIP_WAL ) while 
importing the data into HBase. If the data doesn't need to be replicated to the 
DR cluster or if the same import job would be run on the DR cluster, consider 
using SKIP_WAL durability for performance.

  was:
Import with this fix supports 

a) Filtering of the row using the Filter#filterRowKey(byte[] buffer, int 
offset, int length).

b) Accepts durability parameter (Ex: -Dimport.wal.durability=SKIP_WAL ) while 
importing the data into HBase. If the data doesn't need to be replicated to the 
DR cluster or if the same import job would be run on the dr cluster, consider 
using SKIP_WAL durability for performance.


 Improvements to the import flow
 ---

 Key: HBASE-10416
 URL: https://issues.apache.org/jira/browse/HBASE-10416
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Vasu Mariyala
 Attachments: HBASE-10416-rev1.patch, HBASE-10416.patch


 Following improvements can be made to the Import logic
 a) Make the import extensible (i.e., remove the filter from being a static 
 member of Import and make it an instance variable of the mapper, make the 
 mappers or variables of interest protected. )
 b) Make sure that the Import calls filterRowKey method of the filter (Useful 
 if we want to filter the data of an organization based on the row key or 
 using filters like PrefixFilter which filter the data in filterRowKey method 
 rather than the filterKeyValue method). The existing test case in 
 TestImportExport#testWithFilter works with this assumption but is so far 
 successful because there is only one row inserted into the table.
 c) Provide an option to specify the durability during the import (Specifying 
 the Durability as SKIP_WAL would improve the performance of restore 
 considerably.) [~lhofhansl] suggested that this should be a parameter to the 
 import.
 d) Some minor refactoring to avoid building a comma separated string for the 
 filter args.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10416) Improvements to the import flow

2014-01-28 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10416:
---

Hadoop Flags: Reviewed

 Improvements to the import flow
 ---

 Key: HBASE-10416
 URL: https://issues.apache.org/jira/browse/HBASE-10416
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Vasu Mariyala
Assignee: Vasu Mariyala
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10416-rev1.patch, HBASE-10416.patch


 Following improvements can be made to the Import logic
 a) Make the import extensible (i.e., remove the filter from being a static 
 member of Import and make it an instance variable of the mapper, make the 
 mappers or variables of interest protected. )
 b) Make sure that the Import calls filterRowKey method of the filter (Useful 
 if we want to filter the data of an organization based on the row key or 
 using filters like PrefixFilter which filter the data in filterRowKey method 
 rather than the filterKeyValue method). The existing test case in 
 TestImportExport#testWithFilter works with this assumption but is so far 
 successful because there is only one row inserted into the table.
 c) Provide an option to specify the durability during the import (Specifying 
 the Durability as SKIP_WAL would improve the performance of restore 
 considerably.) [~lhofhansl] suggested that this should be a parameter to the 
 import.
 d) Some minor refactoring to avoid building a comma separated string for the 
 filter args.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10416) Improvements to the import flow

2014-01-28 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10416:
---

Fix Version/s: 0.99.0
   0.98.0
 Assignee: Vasu Mariyala

 Improvements to the import flow
 ---

 Key: HBASE-10416
 URL: https://issues.apache.org/jira/browse/HBASE-10416
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Vasu Mariyala
Assignee: Vasu Mariyala
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10416-rev1.patch, HBASE-10416.patch


 Following improvements can be made to the Import logic
 a) Make the import extensible (i.e., remove the filter from being a static 
 member of Import and make it an instance variable of the mapper, make the 
 mappers or variables of interest protected. )
 b) Make sure that the Import calls filterRowKey method of the filter (Useful 
 if we want to filter the data of an organization based on the row key or 
 using filters like PrefixFilter which filter the data in filterRowKey method 
 rather than the filterKeyValue method). The existing test case in 
 TestImportExport#testWithFilter works with this assumption but is so far 
 successful because there is only one row inserted into the table.
 c) Provide an option to specify the durability during the import (Specifying 
 the Durability as SKIP_WAL would improve the performance of restore 
 considerably.) [~lhofhansl] suggested that this should be a parameter to the 
 import.
 d) Some minor refactoring to avoid building a comma separated string for the 
 filter args.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10416) Improvements to the import flow

2014-01-24 Thread Vasu Mariyala (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasu Mariyala updated HBASE-10416:
--

Attachment: HBASE-10416.patch

Attaching the patch for the above mentioned issues.

 Improvements to the import flow
 ---

 Key: HBASE-10416
 URL: https://issues.apache.org/jira/browse/HBASE-10416
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Vasu Mariyala
 Attachments: HBASE-10416.patch


 Following improvements can be made to the Import logic
 a) Make the import extensible (i.e., remove the filter from being a static 
 member of Import and make it an instance variable of the mapper, make the 
 mappers or variables of interest protected. )
 b) Make sure that the Import calls filterRowKey method of the filter (Useful 
 if we want to filter the data of an organization based on the row key or 
 using filters like PrefixFilter). The existing test case in 
 TestImportExport#testWithFilter works with this assumption but is so far 
 successful because there is only one row inserted into the table.
 c) Provide an option to specify the durability during the import (Specifying 
 the Durability as SKIP_WAL would improve the performance of restore 
 considerably.) [~lhofhansl] suggested that this should be a parameter to the 
 import.
 d) Some minor refactoring to avoid building a comma separated string for the 
 filter args.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10416) Improvements to the import flow

2014-01-24 Thread Vasu Mariyala (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasu Mariyala updated HBASE-10416:
--

Status: Patch Available  (was: Open)

 Improvements to the import flow
 ---

 Key: HBASE-10416
 URL: https://issues.apache.org/jira/browse/HBASE-10416
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Vasu Mariyala
 Attachments: HBASE-10416.patch


 Following improvements can be made to the Import logic
 a) Make the import extensible (i.e., remove the filter from being a static 
 member of Import and make it an instance variable of the mapper, make the 
 mappers or variables of interest protected. )
 b) Make sure that the Import calls filterRowKey method of the filter (Useful 
 if we want to filter the data of an organization based on the row key or 
 using filters like PrefixFilter). The existing test case in 
 TestImportExport#testWithFilter works with this assumption but is so far 
 successful because there is only one row inserted into the table.
 c) Provide an option to specify the durability during the import (Specifying 
 the Durability as SKIP_WAL would improve the performance of restore 
 considerably.) [~lhofhansl] suggested that this should be a parameter to the 
 import.
 d) Some minor refactoring to avoid building a comma separated string for the 
 filter args.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10416) Improvements to the import flow

2014-01-24 Thread Vasu Mariyala (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasu Mariyala updated HBASE-10416:
--

Description: 
Following improvements can be made to the Import logic

a) Make the import extensible (i.e., remove the filter from being a static 
member of Import and make it an instance variable of the mapper, make the 
mappers or variables of interest protected. )

b) Make sure that the Import calls filterRowKey method of the filter (Useful if 
we want to filter the data of an organization based on the row key or using 
filters like PrefixFilter which filter the data in filterRowKey method rather 
than the filterKeyValue method). The existing test case in 
TestImportExport#testWithFilter works with this assumption but is so far 
successful because there is only one row inserted into the table.

c) Provide an option to specify the durability during the import (Specifying 
the Durability as SKIP_WAL would improve the performance of restore 
considerably.) [~lhofhansl] suggested that this should be a parameter to the 
import.

d) Some minor refactoring to avoid building a comma separated string for the 
filter args.

  was:
Following improvements can be made to the Import logic

a) Make the import extensible (i.e., remove the filter from being a static 
member of Import and make it an instance variable of the mapper, make the 
mappers or variables of interest protected. )

b) Make sure that the Import calls filterRowKey method of the filter (Useful if 
we want to filter the data of an organization based on the row key or using 
filters like PrefixFilter). The existing test case in 
TestImportExport#testWithFilter works with this assumption but is so far 
successful because there is only one row inserted into the table.

c) Provide an option to specify the durability during the import (Specifying 
the Durability as SKIP_WAL would improve the performance of restore 
considerably.) [~lhofhansl] suggested that this should be a parameter to the 
import.

d) Some minor refactoring to avoid building a comma separated string for the 
filter args.


 Improvements to the import flow
 ---

 Key: HBASE-10416
 URL: https://issues.apache.org/jira/browse/HBASE-10416
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Vasu Mariyala
 Attachments: HBASE-10416.patch


 Following improvements can be made to the Import logic
 a) Make the import extensible (i.e., remove the filter from being a static 
 member of Import and make it an instance variable of the mapper, make the 
 mappers or variables of interest protected. )
 b) Make sure that the Import calls filterRowKey method of the filter (Useful 
 if we want to filter the data of an organization based on the row key or 
 using filters like PrefixFilter which filter the data in filterRowKey method 
 rather than the filterKeyValue method). The existing test case in 
 TestImportExport#testWithFilter works with this assumption but is so far 
 successful because there is only one row inserted into the table.
 c) Provide an option to specify the durability during the import (Specifying 
 the Durability as SKIP_WAL would improve the performance of restore 
 considerably.) [~lhofhansl] suggested that this should be a parameter to the 
 import.
 d) Some minor refactoring to avoid building a comma separated string for the 
 filter args.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)