[jira] [Commented] (NUTCH-2375) Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce

2017-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137931#comment-16137931
 ] 

ASF GitHub Bot commented on NUTCH-2375:
---

Omkar20895 commented on a change in pull request #188: NUTCH-2375 Upgrade the 
code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce
URL: https://github.com/apache/nutch/pull/188#discussion_r134567680
 
 

 ##
 File path: src/java/org/apache/nutch/fetcher/Fetcher.java
 ##
 @@ -93,39 +103,23 @@
   public static class InputFormat extends
   SequenceFileInputFormat {
 /** Don't split inputs, to keep things polite. */
-public InputSplit[] getSplits(JobConf job, int nSplits) throws IOException 
{
-  FileStatus[] files = listStatus(job);
-  FileSplit[] splits = new FileSplit[files.length];
-  for (int i = 0; i < files.length; i++) {
-FileStatus cur = files[i];
-splits[i] = new FileSplit(cur.getPath(), 0, cur.getLen(),
+public InputSplit[] getSplits(JobContext job, int nSplits) throws 
IOException {
+  Configuration conf = job.getConfiguration();
+  List files = listStatus(job);
 
 Review comment:
   @lewismc These changes were made because in the old API listStatus(job) used 
to return FileSplit array but now it returns a list object containing objects 
of type FileStatus. The code was changed accordingly. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade the code base from org.apache.hadoop.mapred to 
> org.apache.hadoop.mapreduce
> --
>
> Key: NUTCH-2375
> URL: https://issues.apache.org/jira/browse/NUTCH-2375
> Project: Nutch
>  Issue Type: Improvement
>  Components: deployment
>Reporter: Omkar Reddy
>
> Nutch is still using the deprecated org.apache.hadoop.mapred dependency which 
> has been deprecated. It need to be updated to org.apache.hadoop.mapreduce 
> dependency. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (NUTCH-2411) Index-metadata to support indexing multiple values for a field

2017-08-22 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2411:
-
Affects Version/s: 1.13

> Index-metadata to support indexing multiple values for a field 
> ---
>
> Key: NUTCH-2411
> URL: https://issues.apache.org/jira/browse/NUTCH-2411
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.13
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Fix For: 1.14
>
> Attachments: NUTCH-2411.patch
>
>
> {code}
> 
>   index.metadata.separator
>   
>   
>Separator to use if you want to index multiple values for a given field. 
> Leave empty to
>treat each value as a single value.
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (NUTCH-2411) Index-metadata to support indexing multiple values for a field

2017-08-22 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2411:
-
Fix Version/s: 1.14

> Index-metadata to support indexing multiple values for a field 
> ---
>
> Key: NUTCH-2411
> URL: https://issues.apache.org/jira/browse/NUTCH-2411
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.13
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Fix For: 1.14
>
> Attachments: NUTCH-2411.patch
>
>
> {code}
> 
>   index.metadata.separator
>   
>   
>Separator to use if you want to index multiple values for a given field. 
> Leave empty to
>treat each value as a single value.
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (NUTCH-2411) Index-metadata to support indexing multiple values for a field

2017-08-22 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2411:
-
Description: 
{code}

  index.metadata.separator
  
  
   Separator to use if you want to index multiple values for a given field. 
Leave empty to
   treat each value as a single value.
  

{code}

> Index-metadata to support indexing multiple values for a field 
> ---
>
> Key: NUTCH-2411
> URL: https://issues.apache.org/jira/browse/NUTCH-2411
> Project: Nutch
>  Issue Type: Improvement
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: NUTCH-2411.patch
>
>
> {code}
> 
>   index.metadata.separator
>   
>   
>Separator to use if you want to index multiple values for a given field. 
> Leave empty to
>treat each value as a single value.
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (NUTCH-2411) Index-metadata to support indexing multiple values for a field

2017-08-22 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2411:
-
Attachment: NUTCH-2411.patch

Patch!

> Index-metadata to support indexing multiple values for a field 
> ---
>
> Key: NUTCH-2411
> URL: https://issues.apache.org/jira/browse/NUTCH-2411
> Project: Nutch
>  Issue Type: Improvement
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: NUTCH-2411.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (NUTCH-2411) Index-metadata to support indexing multiple values for a field

2017-08-22 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2411:


 Summary: Index-metadata to support indexing multiple values for a 
field 
 Key: NUTCH-2411
 URL: https://issues.apache.org/jira/browse/NUTCH-2411
 Project: Nutch
  Issue Type: Improvement
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)