[jira] Updated: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement

2006-04-13 Thread Stefan Groschupf (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-246?page=all ]

Stefan Groschupf updated NUTCH-246:
---

Attachment: injectWithCurTimeMapper.patch

setFetchTime moved to Mapper.

> segment size is never as big as topN or crawlDB size in a distributed 
> deployement
> -
>
>  Key: NUTCH-246
>  URL: http://issues.apache.org/jira/browse/NUTCH-246
>  Project: Nutch
> Type: Bug

> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Priority: Minor
>  Fix For: 0.8-dev
>  Attachments: injectWithCurTime.patch, injectWithCurTimeMapper.patch
>
> I didn't reopen NUTCH-136 since it is may related to the hadoop split.
> I tested this on two different deployement (with 10 ttrackers + 1 jobtracker 
> and 9 ttracks and 1 jobtracker).
> Defining map and reduce task number in a mapred-default.xml does not solve 
> the problem. (is in nutch/conf on all boxes)
> We verified that it is not  a problem of maximum urls per hosts and also not 
> a problem of the url filter.
> Looks like the first job of the Generator (Selector) already got to less 
> entries to process. 
> May be this is somehow releasted to split generation or configuration inside 
> the distributed jobtracker since it runs in a different jvm as the jobclient.
> However we was not able to find the source for this problem.
> I think that should be fixed before  publishing a nutch 0.8. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Re: [ot] binary subversion diffs

2006-04-13 Thread Dawid Weiss


Subversion basically uses plain diff so I believe what you ask for isn't 
possible. But if somebody knows otherwise I'd also appreciate a note.


D.

Stefan Groschupf wrote:

Hi,
does any body know how to do svn diff's that contains binary content, 
like jars or images?
I was not able to find any useful information in the documentation or 
internet.

Thanks for any hints.
Stefan

-
blog: http://www.find23.org
company: http://www.media-style.com




[ot] binary subversion diffs

2006-04-13 Thread Stefan Groschupf

Hi,
does any body know how to do svn diff's that contains binary content,  
like jars or images?
I was not able to find any useful information in the documentation or  
internet.

Thanks for any hints.
Stefan

-
blog: http://www.find23.org
company: http://www.media-style.com




Re: 0.8 release?

2006-04-13 Thread Dawid Weiss

It seems to be identical Piotr, thanks.
D.

Piotr Kosiorowski wrote:
I had problems with DOS/Unix new lines and some (still unsolved) 
environment settings on my linux box - I will try to solve it. Anyway I 
was able to apply the patch on Cygwin. Could you please have a look at 
it so we will be sure I have not applied it wrongly (I think it is 
correct but I did it so many times that I want to cross check).

Regards
Piotr

Dawid Weiss wrote:


What kind of problems? If you need something, let me know.
D.

Piotr Kosiorowski wrote:

I got some problems while applying Dawid clustering patch (my linux
environment looks not to be setu correctly) - but I switched to cygwin
and it looks ok. I will try to commit it today/tommorow.
Regards
Piotr

On 4/12/06, Chris Mattmann <[EMAIL PROTECTED]> wrote:

Hi Guys,

 Any progress on the 0.8 release? Was there any resolution about 
which JIRA

issues to complete before the 0.8 release? We had a bit of conversation
there and some ideas, but no definitive answer...

Thanks for your help, and sorry to pester ;)

Cheers,
 Chris

__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_
Jet Propulsion LaboratoryPasadena, CA
Office: 171-266BMailstop:  171-246
___

Disclaimer:  The opinions presented within are my own and do not 
reflect

those of either NASA, JPL, or the California Institute of Technology.









Re: Content-Type inconsistency?

2006-04-13 Thread Jérôme Charron
I would like to come back on this issue:
The Content object holds two content-types:
1. The raw content-type from the protocol layer (http header in case of
http) in the Content's metadata
2. The guessed content-type in a private field content-type.

When a ParseData object is created, it takes only the Content's metadata.
So, the ParseData can only access the raw content type and not the one
guessed.

What I suggest is :
1. add a content-type parameter in the ParseData constructors (so that
Parsers  can pass the guessed content-type to ParseData).
2. The Content object stores the guessed content-type in it's metadata in a
special attribute named for instance GUESSED_CONTENT_TYPE, so that the
ParseData can access it

I think 1. is really cleanest way to implement this, but there is a lot of
code impacted => all the parsers.
Solution 2. have no impact on APIs, so the code changes are very small.

Suggestions? Comments?

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/


Re: 0.8 release?

2006-04-13 Thread Piotr Kosiorowski
I had problems with DOS/Unix new lines and some (still unsolved) 
environment settings on my linux box - I will try to solve it. Anyway I 
was able to apply the patch on Cygwin. Could you please have a look at 
it so we will be sure I have not applied it wrongly (I think it is 
correct but I did it so many times that I want to cross check).

Regards
Piotr

Dawid Weiss wrote:


What kind of problems? If you need something, let me know.
D.

Piotr Kosiorowski wrote:

I got some problems while applying Dawid clustering patch (my linux
environment looks not to be setu correctly) - but I switched to cygwin
and it looks ok. I will try to commit it today/tommorow.
Regards
Piotr

On 4/12/06, Chris Mattmann <[EMAIL PROTECTED]> wrote:

Hi Guys,

 Any progress on the 0.8 release? Was there any resolution about 
which JIRA

issues to complete before the 0.8 release? We had a bit of conversation
there and some ideas, but no definitive answer...

Thanks for your help, and sorry to pester ;)

Cheers,
 Chris

__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_
Jet Propulsion LaboratoryPasadena, CA
Office: 171-266BMailstop:  171-246
___

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.









[jira] Commented: (NUTCH-245) DTD for plugin.xml configuration files

2006-04-13 Thread Jerome Charron (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-245?page=comments#action_12374339 ] 

Jerome Charron commented on NUTCH-245:
--

I would prefer to change the "ugly" parts of the DTD now (before a future 1.0) 
and suggest to change it to something like (and to change the plugin.xml and 
Plugin Manifest Reader too):






> DTD for plugin.xml configuration files
> --
>
>  Key: NUTCH-245
>  URL: http://issues.apache.org/jira/browse/NUTCH-245
>  Project: Nutch
> Type: New Feature

>   Components: fetcher, indexer, ndfs, searcher, web gui
> Versions: 0.7.2, 0.7.1, 0.7, 0.6, 0.8-dev
>  Environment: Power PC Dual Processor 2.0 Ghz, Mac OS X 10.4, although 
> improvement is independent of environment
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Priority: Minor
>  Attachments: NUTCH-245.Mattmann.patch.txt
>
> Currently, the plugin.xml file does not have a DTD or XML Schema associated 
> with it, and most people just go look at an existing plugin's plugin.xml file 
> to determine what are the allowable elements, etc. There should be an 
> explicit plugin DTD file that describes the plugin.xml file. I'll look at the 
> code and attach a plugin.dtd file for the Nutch conf directory later today. 
> This way, people can use the DTD file to automatically (using tools such as 
> XMLSpy) generate plugin.xml files that can then be validated. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Re: 0.8 release?

2006-04-13 Thread Dawid Weiss


What kind of problems? If you need something, let me know.
D.

Piotr Kosiorowski wrote:

I got some problems while applying Dawid clustering patch (my linux
environment looks not to be setu correctly) - but I switched to cygwin
and it looks ok. I will try to commit it today/tommorow.
Regards
Piotr

On 4/12/06, Chris Mattmann <[EMAIL PROTECTED]> wrote:

Hi Guys,

 Any progress on the 0.8 release? Was there any resolution about which JIRA
issues to complete before the 0.8 release? We had a bit of conversation
there and some ideas, but no definitive answer...

Thanks for your help, and sorry to pester ;)

Cheers,
 Chris

__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_
Jet Propulsion LaboratoryPasadena, CA
Office: 171-266BMailstop:  171-246
___

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.





haddoop

2006-04-13 Thread Anton Potehin
Why ndfs renamed to hadoop ?

Why sub project with mapred is named by hadoop ? ;-)