Arun.

Please keep the discussion on the list.

I think what you want is not possible; it should be achieved through
the regex-urlfilter.

Rgrds, Thomas

---------- Forwarded message ----------
From: Arun Kumar Sharma <[EMAIL PROTECTED]>
Date: Apr 25, 2006 9:48 AM
Subject: Re: unable to filter different file format like
.java,.jar,.class with nutch version 0.7.2
To: TDLN <[EMAIL PROTECTED]>



Yes, you misunderstood my question. I do not want to fetch anything
for which I didnot enable parse-plugins.

  I only want to parse html and text files. So I put only put
respective parser for these document in plugins directory.

   But my search result(and log info for fetching) are showing results
for .java,.class and .jar ,.dll files.

  I hope this time you got my problem right....


TDLN <[EMAIL PROTECTED]> wrote:

> But earlier it happen I think with nutch 0.7.1 that un-parseable file type
> neither fetch nor shown in search results.

Isn't this what you want? If so, just use the regex-urlfilter, I would
say. What is the sense in fetching files that a) can't be parsed and
b) can't be indexed as a result and thus c) will not show in the
search results?

Or am I misunderstanding your question?

Rgrds, Thomas



> Why not urlfilter is returning "null" value unparsable content...
> Why it is being added to fetchlist as happen earlier with nutch 0.7.1.
> FYI, I crawled the same things earlier with nutch 0.7.1, and unparseable
> file are not added then in the fetchlist !!, why it is now happen .. Do u
> think I have modify something which has side effects of this kind..
>
>
> TDLN wrote:
>
> > Since there are number of file format and I can't add each of them in
> ignore list.
>
> Why not? You can add something like
>
> -\.(java|.class|jar|dll)
>
> etc.
>
> Rgrds, Thomas
>
>
>
> > Alternative could be that it fetch and show result only of parsable
> documents.
> > can anybody help me in this regards.....l
> >
> >
> >
> > Regards,
> > Arun Sharma (Tech Lead-Java/J2EE )
> > www.voltix.com, www.voltixindia.com
> > SCO 13-15, Sector 34A
> > Chandigarh
> >
> >
> >
> >
> >
> > ---------------------------------
> > Jiyo cricket on Yahoo! India cricket
> > Yahoo! Messenger Mobile Stay in touch with your buddies all the time.
> >
>
>
>
>
>
> Regards,
> Arun Sharma (Tech Lead-Java/J2EE )
> www.voltix.com, www.voltixindia.com
> SCO 13-15, Sector 34A
> Chandigarh
>
>
> ________________________________
> Jiyo cricket on Yahoo! India cricket
> Yahoo! Messenger Mobile Stay in touch with your buddies all the time.
>
>






Regards,
Arun Sharma (Tech Lead-Java/J2EE )
www.voltix.com, www.voltixindia.com
SCO 13-15, Sector 34A
Chandigarh


 ________________________________
 Jiyo cricket on Yahoo! India cricket
 Yahoo! Messenger Mobile Stay in touch with your buddies all the time.


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to