Regarding Setup Lucine for my site

2003-03-04 Thread Velázquez

The documentation says:

Once you've gotten this far you're probably itching to go. Let's start by creating the 
index you'll need for the web examples. Since you've already set your classpath in the 
previous examples, all you need to do is type "java org.apache.lucene.demo.IndexHTML 
-create -index {index-dir} ..". You'll need to do this from a (any) subdirectory of 
your {tomcat}/webapps directory (make sure you didn't leave off the ".." or you'll get 
a null pointer exception). {index-dir} should be a directory that Tomcat has 
permission to read and write, but is outside of a web accessible context. By default 
the webapp is configured to look in /opt/lucene/index for this index. 

A copy of my site is in:

C:\CopiaSite20030228\

My web application runs on

http://mydomain.com/search/index.jsp

how can I make the lucene index map the URLs of the indexed files to:

http://mydomain.com/

 

Please help!


Samuel Alfonso Velázquez Díaz
http://www.geocities.com/samuelvd
[EMAIL PROTECTED]


-
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more

Re: Regarding Setup Lucine for my site

2003-03-04 Thread Jeff Linwood
Hi,

I'm not a hundred percent sure I understand what you are asking, but when
you get the results back from Lucene (the hits) it's up to you to format
them to display on a web page - you can always do the modification there
when you display the links to the results.

Jeff
- Original Message -
From: "Samuel Alfonso Velázquez Díaz" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Tuesday, March 04, 2003 11:33 AM
Subject: Regarding Setup Lucine for my site


>
> The documentation says:
>
> Once you've gotten this far you're probably itching to go. Let's start by
creating the index you'll need for the web examples. Since you've already
set your classpath in the previous examples, all you need to do is type
"java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..".
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps
directory (make sure you didn't leave off the ".." or you'll get a null
pointer exception). {index-dir} should be a directory that Tomcat has
permission to read and write, but is outside of a web accessible context. By
default the webapp is configured to look in /opt/lucene/index for this
index.
>
> A copy of my site is in:
>
> C:\CopiaSite20030228\
>
> My web application runs on
>
> http://mydomain.com/search/index.jsp
>
> how can I make the lucene index map the URLs of the indexed files to:
>
> http://mydomain.com/
>
>
>
> Please help!
>
>
> Samuel Alfonso Velázquez Díaz
> http://www.geocities.com/samuelvd
> [EMAIL PROTECTED]
>
>
> -
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Regarding Setup Lucine for my site

2003-03-04 Thread Velázquez

Oh ok, I thougth it was going to be some thing like the egothor search engine (A java 
based search engine). When you create the Index, you issue a command like:
java org.egothor.indexer.mirror.DoTanker  /tmp/my_www Project/Egothor/var/www as 
http://localhost:8080
/thmp/my_www: Is the path to the directory where the index is to be created
Project/Egothor/var/www: is the path to the local file system files to be indexed.
and as http://localhost:8080 is the prefix that the index will keep on the hit list. 
This way the index will be relative to http://localhost:8080. Even if your production 
site may be an other site.
Thanks for your comments, any way now I know that I have to modify code to do this.
Regards!
 Jeff Linwood <[EMAIL PROTECTED]> wrote:Hi,

I'm not a hundred percent sure I understand what you are asking, but when
you get the results back from Lucene (the hits) it's up to you to format
them to display on a web page - you can always do the modification there
when you display the links to the results.

Jeff
- Original Message -
From: "Samuel Alfonso Velázquez Díaz" 
To: "Lucene Users List" 
Sent: Tuesday, March 04, 2003 11:33 AM
Subject: Regarding Setup Lucine for my site


>
> The documentation says:
>
> Once you've gotten this far you're probably itching to go. Let's start by
creating the index you'll need for the web examples. Since you've already
set your classpath in the previous examples, all you need to do is type
"java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..".
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps
directory (make sure you didn't leave off the ".." or you'll get a null
pointer exception). {index-dir} should be a directory that Tomcat has
permission to read and write, but is outside of a web accessible context. By
default the webapp is configured to look in /opt/lucene/index for this
index.
>
> A copy of my site is in:
>
> C:\CopiaSite20030228\
>
> My web application runs on
>
> http://mydomain.com/search/index.jsp
>
> how can I make the lucene index map the URLs of the indexed files to:
>
> http://mydomain.com/
>
>
>
> Please help!
>
>
> Samuel Alfonso Velázquez Díaz
> http://www.geocities.com/samuelvd
> [EMAIL PROTECTED]
>
>
> -
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Samuel Alfonso Velázquez Díaz
http://www.geocities.com/samuelvd
[EMAIL PROTECTED]


-
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more

Re: Regarding Setup Lucine for my site

2003-03-04 Thread Pinky Iyer

I dont understand the explanantion. When I try and index the documents as mentioned in 
the examples, and then when i run the app and do a sample search, it does point to the 
directory structure say "c:/filesToIndex/www/" instead of 
"http://localhost:8080/www/";. So how can this be changed to reflect the website domain 
as mentioned by you. Could you explain again. Say my docs are under a directory 
c:/filesToIndex/www/ and the wesite is as you said http://localhost:8080/ , then how 
to proceed!
Thanks in advance!
 Samuel Alfonso Velázquez Díaz <[EMAIL PROTECTED]> wrote:
Oh ok, I thougth it was going to be some thing like the egothor search engine (A java 
based search engine). When you create the Index, you issue a command like:
java org.egothor.indexer.mirror.DoTanker /tmp/my_www Project/Egothor/var/www as 
http://localhost:8080
/thmp/my_www: Is the path to the directory where the index is to be created
Project/Egothor/var/www: is the path to the local file system files to be indexed.
and as http://localhost:8080 is the prefix that the index will keep on the hit list. 
This way the index will be relative to http://localhost:8080. Even if your production 
site may be an other site.
Thanks for your comments, any way now I know that I have to modify code to do this.
Regards!
Jeff Linwood wrote:Hi,

I'm not a hundred percent sure I understand what you are asking, but when
you get the results back from Lucene (the hits) it's up to you to format
them to display on a web page - you can always do the modification there
when you display the links to the results.

Jeff
- Original Message -
From: "Samuel Alfonso Velázquez Díaz" 
To: "Lucene Users List" 
Sent: Tuesday, March 04, 2003 11:33 AM
Subject: Regarding Setup Lucine for my site


>
> The documentation says:
>
> Once you've gotten this far you're probably itching to go. Let's start by
creating the index you'll need for the web examples. Since you've already
set your classpath in the previous examples, all you need to do is type
"java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..".
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps
directory (make sure you didn't leave off the ".." or you'll get a null
pointer exception). {index-dir} should be a directory that Tomcat has
permission to read and write, but is outside of a web accessible context. By
default the webapp is configured to look in /opt/lucene/index for this
index.
>
> A copy of my site is in:
>
> C:\CopiaSite20030228\
>
> My web application runs on
>
> http://mydomain.com/search/index.jsp
>
> how can I make the lucene index map the URLs of the indexed files to:
>
> http://mydomain.com/
>
>
>
> Please help!
>
>
> Samuel Alfonso Velázquez Díaz
> http://www.geocities.com/samuelvd
> [EMAIL PROTECTED]
>
>
> -
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Samuel Alfonso Velázquez Díaz
http://www.geocities.com/samuelvd
[EMAIL PROTECTED]


-
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more


-
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more

Re: Regarding Setup Lucine for my site

2003-03-04 Thread Jeff Linwood
One point to note about Lucene is that it isn't a stand-alone search engine
like Inktomi.  It lets you build a search engine into your application.
You (as a developer) are responsible for writing the code that adds your
content to the index, and for writing the code that displays the search
results to the user. The demo code is great, but it's really just a start
for your applications

One approach would be to store the paths of each file (relative to
c:\myfiles\www) as a field on the document, and then use that path to build
up a link in the search results page.   You could add a server name here if
you needed to

Hope this helps,
Jeff
- Original Message -
From: "Pinky Iyer" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Tuesday, March 04, 2003 2:24 PM
Subject: Re: Regarding Setup Lucine for my site


>
> I dont understand the explanantion. When I try and index the documents as
mentioned in the examples, and then when i run the app and do a sample
search, it does point to the directory structure say "c:/filesToIndex/www/"
instead of "http://localhost:8080/www/";. So how can this be changed to
reflect the website domain as mentioned by you. Could you explain again. Say
my docs are under a directory c:/filesToIndex/www/ and the wesite is as you
said http://localhost:8080/ , then how to proceed!
> Thanks in advance!
>  Samuel Alfonso Velázquez Díaz <[EMAIL PROTECTED]> wrote:
> Oh ok, I thougth it was going to be some thing like the egothor search
engine (A java based search engine). When you create the Index, you issue a
command like:
> java org.egothor.indexer.mirror.DoTanker /tmp/my_www
Project/Egothor/var/www as http://localhost:8080
> /thmp/my_www: Is the path to the directory where the index is to be
created
> Project/Egothor/var/www: is the path to the local file system files to be
indexed.
> and as http://localhost:8080 is the prefix that the index will keep on the
hit list. This way the index will be relative to http://localhost:8080. Even
if your production site may be an other site.
> Thanks for your comments, any way now I know that I have to modify code to
do this.
> Regards!
> Jeff Linwood wrote:Hi,
>
> I'm not a hundred percent sure I understand what you are asking, but when
> you get the results back from Lucene (the hits) it's up to you to format
> them to display on a web page - you can always do the modification there
> when you display the links to the results.
>
> Jeff
> - Original Message -----
> From: "Samuel Alfonso Velázquez Díaz"
> To: "Lucene Users List"
> Sent: Tuesday, March 04, 2003 11:33 AM
> Subject: Regarding Setup Lucine for my site
>
>
> >
> > The documentation says:
> >
> > Once you've gotten this far you're probably itching to go. Let's start
by
> creating the index you'll need for the web examples. Since you've already
> set your classpath in the previous examples, all you need to do is type
> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..".
> You'll need to do this from a (any) subdirectory of your {tomcat}/webapps
> directory (make sure you didn't leave off the ".." or you'll get a null
> pointer exception). {index-dir} should be a directory that Tomcat has
> permission to read and write, but is outside of a web accessible context.
By
> default the webapp is configured to look in /opt/lucene/index for this
> index.
> >
> > A copy of my site is in:
> >
> > C:\CopiaSite20030228\
> >
> > My web application runs on
> >
> > http://mydomain.com/search/index.jsp
> >
> > how can I make the lucene index map the URLs of the indexed files to:
> >
> > http://mydomain.com/
> >
> >
> >
> > Please help!
> >
> >
> > Samuel Alfonso Velázquez Díaz
> > http://www.geocities.com/samuelvd
> > [EMAIL PROTECTED]
> >
> >
> > -
> > Do you Yahoo!?
> > Yahoo! Tax Center - forms, calculators, tips, and more
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> Samuel Alfonso Velázquez Díaz
> http://www.geocities.com/samuelvd
> [EMAIL PROTECTED]
>
>
> -
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more
>
>
> -
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Regarding Setup Lucine for my site

2003-03-04 Thread Velázquez

Yes I have
1.- The directory with the files to index:
C:/filesToIndex/www/
 
2.- A path where the index files from the search engine will be created, lets say
C:/index/
3.- I have an internet domain whose name is: www.mysite.com
4.- A web application context that runs at http://www.mysite.com/search
 
Once I have set all the above things I want to be able to use the search aplication:
http://www.mysite.com/search/search.jsp
And I dont want that the results that I get from the index (step 2) give me results 
like
Your file is at
C:/filesToIndex/www/some_html/my_doc.html
The results should be:
Your file is at
http://www.mysite.com/some_html/my_doc.html
For the comments I have read (THANK YOU VERY MUTCH) I conclude that there is no way to 
generate the index with some custom prefix (as http://www.mysite.com/ for the 
documents at C:/filesToIndex/www/).
It seems that I have to modify my web application 
(http://www.mysite.com/search/search.jsp) to include some logic to repalce 
"C:/filesToIndex/www/" to "http://www.mysite.com/";.
If you could point me to the source code of lucene to include this logic and this way 
fix it once and for all, will appreciate a lot.
The command I used to generate this index was:
java org.apache.lucene.demo.IndexHTML -create -index index C:\index C:\filesToIndex\ 
www\
Now in the web application I have to modify 
  IndexSearcher searcher;
  Query query;  
  Hits hits;

  // some code after...
 hits = searcher.search(query); 

  for ( /* search through the hit list*/)

  Document doc = hits.doc(i);
  String doctitle = doc.get("title");
  String url = doc.get("url");   

I have to do some thing like url = "http://www.mysite.com/"; + 
url.substring("C:/filesToIndex/www/".length);

Regards!!!
And thanks again
 Pinky Iyer <[EMAIL PROTECTED]> wrote:
I dont understand the explanantion. When I try and index the documents as mentioned in 
the examples, and then when i run the app and do a sample search, it does point to the 
directory structure say "c:/filesToIndex/www/" instead of 
"http://localhost:8080/www/";. So how can this be changed to reflect the website domain 
as mentioned by you. Could you explain again. Say my docs are under a directory 
c:/filesToIndex/www/ and the wesite is as you said http://localhost:8080/ , then how 
to proceed!
Thanks in advance!
Samuel Alfonso Velázquez Díaz wrote:
Oh ok, I thougth it was going to be some thing like the egothor search engine (A java 
based search engine). When you create the Index, you issue a command like:
java org.egothor.indexer.mirror.DoTanker /tmp/my_www Project/Egothor/var/www as 
http://localhost:8080
/thmp/my_www: Is the path to the directory where the index is to be created
Project/Egothor/var/www: is the path to the local file system files to be indexed.
and as http://localhost:8080 is the prefix that the index will keep on the hit list. 
This way the index will be relative to http://localhost:8080. Even if your production 
site may be an other site.
Thanks for your comments, any way now I know that I have to modify code to do this.
Regards!
Jeff Linwood wrote:Hi,

I'm not a hundred percent sure I understand what you are asking, but when
you get the results back from Lucene (the hits) it's up to you to format
them to display on a web page - you can always do the modification there
when you display the links to the results.

Jeff
- Original Message -
From: "Samuel Alfonso Velázquez Díaz" 
To: "Lucene Users List" 
Sent: Tuesday, March 04, 2003 11:33 AM
Subject: Regarding Setup Lucine for my site


>
> The documentation says:
>
> Once you've gotten this far you're probably itching to go. Let's start by
creating the index you'll need for the web examples. Since you've already
set your classpath in the previous examples, all you need to do is type
"java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..".
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps
directory (make sure you didn't leave off the ".." or you'll get a null
pointer exception). {index-dir} should be a directory that Tomcat has
permission to read and write, but is outside of a web accessible context. By
default the webapp is configured to look in /opt/lucene/index for this
index.
>
> A copy of my site is in:
>
> C:\CopiaSite20030228\
>
> My web application runs on
>
> http://mydomain.com/search/index.jsp
>
> how can I make the lucene index map the URLs of the indexed files to:
>
> http://mydomain.com/
>
>
>
> Please help!
>
>
> Samuel Alfonso Velázquez Díaz
> http://www.geocities.com/samuelvd
> [EMAIL PROTECTED]
>
>
> -
> Do you Yahoo!?
> Yahoo! Tax Center -

Re: Regarding Setup Lucine for my site

2003-03-04 Thread Otis Gospodnetic
Samuel,

Some basic understanding of what Lucene is what is missing here.
Lucene does not index web pages.
Lucene indexes text.
Lucene is not automatically aware of your wb site nor your domain.
Lucene is aware only of what you 'feed it' at index time.
If you index files, which IndexDemo does, Lucene index will have only
information about files (information such as file path).  Lucene has no
clue that you really want to index your web site.
Even if you could replace C:\. with http:// it wouldn't be a
good solution, as directory structures and file paths do not always map
directly to URLs.

In short, you have a bit more reading to do :)
The information is all there, it just has to be read :(
Good luck!

Otis



--- Samuel Alfonso Velázquez Díaz <[EMAIL PROTECTED]> wrote:
> 
> Yes I have
> 1.- The directory with the files to index:
> C:/filesToIndex/www/
>  
> 2.- A path where the index files from the search engine will be
> created, lets say
> C:/index/
> 3.- I have an internet domain whose name is: www.mysite.com
> 4.- A web application context that runs at
> http://www.mysite.com/search
>  
> Once I have set all the above things I want to be able to use the
> search aplication:
> http://www.mysite.com/search/search.jsp
> And I dont want that the results that I get from the index (step 2)
> give me results like
> Your file is at
> C:/filesToIndex/www/some_html/my_doc.html
> The results should be:
> Your file is at
> http://www.mysite.com/some_html/my_doc.html
> For the comments I have read (THANK YOU VERY MUTCH) I conclude that
> there is no way to generate the index with some custom prefix (as
> http://www.mysite.com/ for the documents at C:/filesToIndex/www/).
> It seems that I have to modify my web application
> (http://www.mysite.com/search/search.jsp) to include some logic to
> repalce "C:/filesToIndex/www/" to "http://www.mysite.com/";.
> If you could point me to the source code of lucene to include this
> logic and this way fix it once and for all, will appreciate a lot.
> The command I used to generate this index was:
> java org.apache.lucene.demo.IndexHTML -create -index index C:\index
> C:\filesToIndex\ www\
> Now in the web application I have to modify 
>   IndexSearcher searcher;
>   Query query;  
>   Hits hits;
> 
>   // some code after...
>  hits = searcher.search(query); 
> 
>   for ( /* search through the hit list*/)
> 
>   Document doc = hits.doc(i);
>   String doctitle = doc.get("title");
>   String url = doc.get("url");   
> 
> I have to do some thing like url = "http://www.mysite.com/"; +
> url.substring("C:/filesToIndex/www/".length);
> 
> Regards!!!
> And thanks again
>  Pinky Iyer <[EMAIL PROTECTED]> wrote:
> I dont understand the explanantion. When I try and index the
> documents as mentioned in the examples, and then when i run the app
> and do a sample search, it does point to the directory structure say
> "c:/filesToIndex/www/" instead of "http://localhost:8080/www/";. So
> how can this be changed to reflect the website domain as mentioned by
> you. Could you explain again. Say my docs are under a directory
> c:/filesToIndex/www/ and the wesite is as you said
> http://localhost:8080/ , then how to proceed!
> Thanks in advance!
> Samuel Alfonso Velázquez Díaz wrote:
> Oh ok, I thougth it was going to be some thing like the egothor
> search engine (A java based search engine). When you create the
> Index, you issue a command like:
> java org.egothor.indexer.mirror.DoTanker /tmp/my_www
> Project/Egothor/var/www as http://localhost:8080
> /thmp/my_www: Is the path to the directory where the index is to be
> created
> Project/Egothor/var/www: is the path to the local file system files
> to be indexed.
> and as http://localhost:8080 is the prefix that the index will keep
> on the hit list. This way the index will be relative to
> http://localhost:8080. Even if your production site may be an other
> site.
> Thanks for your comments, any way now I know that I have to modify
> code to do this.
> Regards!
> Jeff Linwood wrote:Hi,
> 
> I'm not a hundred percent sure I understand what you are asking, but
> when
> you get the results back from Lucene (the hits) it's up to you to
> format
> them to display on a web page - you can always do the modification
> there
> when you display the links to the results.
> 
> Jeff
> - Original Message -
> From: "Samuel Alfonso Velázquez Díaz" 
> To: "Lucene Users List" 
> Sent: Tuesday, March 04, 2003 11:33 AM
> Subject: Regarding Setup Lucine for my si

Re: Regarding Setup Lucine for my site

2003-03-05 Thread Catalin
hi there !
we have almost the same configuration (site, index, paths, etc) like you.
we used for our search on the site another approach.

eg: use a small crawler to index some feeded urls,
make the lucene index, make the web search app to use that index.

for crawling:
http://cvs.cabanova.ro/viewcvs.cgi/indexer/

for webapp:
http://cvs.cabanova.ro/viewcvs.cgi/wsearch/

running online:
http://www.anet.ro/search?query=star+wars

the code of the indexer is based on i2a websearch application demo
that is listed on lucene jakarta site.

take a look, maybe you might find something usefull !
there is no .zip available for download.
but if somebody requests the .zip
we can put it online.

have fun !

Catalin

  - Original Message - 
  From: Samuel Alfonso Velázquez Díaz 
  To: Lucene Users List 
  Sent: Wednesday, March 05, 2003 3:16 AM
  Subject: Re: Regarding Setup Lucine for my site



  Yes I have
  1.- The directory with the files to index:
  C:/filesToIndex/www/
   
  2.- A path where the index files from the search engine will be created, lets say
  C:/index/
  3.- I have an internet domain whose name is: www.mysite.com
  4.- A web application context that runs at http://www.mysite.com/search
   
  Once I have set all the above things I want to be able to use the search aplication:
  http://www.mysite.com/search/search.jsp
  And I dont want that the results that I get from the index (step 2) give me results 
like
  Your file is at
  C:/filesToIndex/www/some_html/my_doc.html
  The results should be:
  Your file is at
  http://www.mysite.com/some_html/my_doc.html
  For the comments I have read (THANK YOU VERY MUTCH) I conclude that there is no way 
to generate the index with some custom prefix (as http://www.mysite.com/ for the 
documents at C:/filesToIndex/www/).
  It seems that I have to modify my web application 
(http://www.mysite.com/search/search.jsp) to include some logic to repalce 
"C:/filesToIndex/www/" to "http://www.mysite.com/";.
  If you could point me to the source code of lucene to include this logic and this 
way fix it once and for all, will appreciate a lot.
  The command I used to generate this index was:
  java org.apache.lucene.demo.IndexHTML -create -index index C:\index C:\filesToIndex\ 
www\
  Now in the web application I have to modify 
IndexSearcher searcher;
Query query;  
Hits hits;

// some code after...
   hits = searcher.search(query); 

for ( /* search through the hit list*/)

Document doc = hits.doc(i);
String doctitle = doc.get("title");
String url = doc.get("url");   

  I have to do some thing like url = "http://www.mysite.com/"; + 
url.substring("C:/filesToIndex/www/".length);

  Regards!!!
  And thanks again
   Pinky Iyer <[EMAIL PROTECTED]> wrote:
  I dont understand the explanantion. When I try and index the documents as mentioned 
in the examples, and then when i run the app and do a sample search, it does point to 
the directory structure say "c:/filesToIndex/www/" instead of 
"http://localhost:8080/www/";. So how can this be changed to reflect the website domain 
as mentioned by you. Could you explain again. Say my docs are under a directory 
c:/filesToIndex/www/ and the wesite is as you said http://localhost:8080/ , then how 
to proceed!
  Thanks in advance!
  Samuel Alfonso Velázquez Díaz wrote:
  Oh ok, I thougth it was going to be some thing like the egothor search engine (A 
java based search engine). When you create the Index, you issue a command like:
  java org.egothor.indexer.mirror.DoTanker /tmp/my_www Project/Egothor/var/www as 
http://localhost:8080
  /thmp/my_www: Is the path to the directory where the index is to be created
  Project/Egothor/var/www: is the path to the local file system files to be indexed.
  and as http://localhost:8080 is the prefix that the index will keep on the hit list. 
This way the index will be relative to http://localhost:8080. Even if your production 
site may be an other site.
  Thanks for your comments, any way now I know that I have to modify code to do this.
  Regards!
  Jeff Linwood wrote:Hi,

  I'm not a hundred percent sure I understand what you are asking, but when
  you get the results back from Lucene (the hits) it's up to you to format
  them to display on a web page - you can always do the modification there
  when you display the links to the results.

  Jeff
  - Original Message -
  From: "Samuel Alfonso Velázquez Díaz" 
  To: "Lucene Users List" 
  Sent: Tuesday, March 04, 2003 11:33 AM
  Subject: Regarding Setup Lucine for my site


  >
  > The documentation says:
  >
  > Once you've gotten this far you're probably itching to go. Let's start by
  creating the index you'll need for the web examples. Since you've already
  set your classpath in

Re: Regarding Setup Lucine for my site

2003-03-05 Thread Eric Anderson
t; Index, you issue a command like:
> > java org.egothor.indexer.mirror.DoTanker /tmp/my_www
> > Project/Egothor/var/www as http://localhost:8080
> > /thmp/my_www: Is the path to the directory where the index is to be
> > created
> > Project/Egothor/var/www: is the path to the local file system files
> > to be indexed.
> > and as http://localhost:8080 is the prefix that the index will keep
> > on the hit list. This way the index will be relative to
> > http://localhost:8080. Even if your production site may be an other
> > site.
> > Thanks for your comments, any way now I know that I have to modify
> > code to do this.
> > Regards!
> > Jeff Linwood wrote:Hi,
> > 
> > I'm not a hundred percent sure I understand what you are asking, but
> > when
> > you get the results back from Lucene (the hits) it's up to you to
> > format
> > them to display on a web page - you can always do the modification
> > there
> > when you display the links to the results.
> > 
> > Jeff
> > - Original Message -
> > From: "Samuel Alfonso Velázquez Díaz" 
> > To: "Lucene Users List" 
> > Sent: Tuesday, March 04, 2003 11:33 AM
> > Subject: Regarding Setup Lucine for my site
> > 
> > 
> > >
> > > The documentation says:
> > >
> > > Once you've gotten this far you're probably itching to go. Let's
> > start by
> > creating the index you'll need for the web examples. Since you've
> > already
> > set your classpath in the previous examples, all you need to do is
> > type
> > "java org.apache.lucene.demo.IndexHTML -create -index {index-dir}
> > ..".
> > You'll need to do this from a (any) subdirectory of your
> > {tomcat}/webapps
> > directory (make sure you didn't leave off the ".." or you'll get a
> > null
> > pointer exception). {index-dir} should be a directory that Tomcat has
> > permission to read and write, but is outside of a web accessible
> > context. By
> > default the webapp is configured to look in /opt/lucene/index for
> > this
> > index.
> > >
> > > A copy of my site is in:
> > >
> > > C:\CopiaSite20030228\
> > >
> > > My web application runs on
> > >
> > > http://mydomain.com/search/index.jsp
> > >
> > > how can I make the lucene index map the URLs of the indexed files
> > to:
> > >
> > > http://mydomain.com/
> > >
> > >
> > >
> > > Please help!
> > >
> > >
> > > Samuel Alfonso Velázquez Díaz
> > > http://www.geocities.com/samuelvd
> > > [EMAIL PROTECTED]
> > >
> > >
> > > -
> > > Do you Yahoo!?
> > > Yahoo! Tax Center - forms, calculators, tips, and more
> > 
> > 
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> > Samuel Alfonso Velázquez Díaz
> > http://www.geocities.com/samuelvd
> > [EMAIL PROTECTED]
> > 
> > 
> > -
> > Do you Yahoo!?
> > Yahoo! Tax Center - forms, calculators, tips, and more
> > 
> > 
> > -
> > Do you Yahoo!?
> > Yahoo! Tax Center - forms, calculators, tips, and more
> > 
> > Samuel Alfonso Velázquez Díaz
> > http://www.geocities.com/samuelvd
> > [EMAIL PROTECTED]
> > 
> > 
> > -
> > Do you Yahoo!?
> > Yahoo! Tax Center - forms, calculators, tips, and more
> 
> 
> __
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, more
> http://taxes.yahoo.com/
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Regarding Setup Lucine for my site

2003-03-05 Thread Velázquez

Hi, I'd like to take a look at the webapp war file or zip tarball for wsearch and 
indexer crawling
 Catalin <[EMAIL PROTECTED]> wrote:..
for crawling:
http://cvs.cabanova.ro/viewcvs.cgi/indexer/

for webapp:
http://cvs.cabanova.ro/viewcvs.cgi/wsearch/

running online:
http://www.anet.ro/search?query=star+wars


Samuel Alfonso Velázquez Díaz
http://www.geocities.com/samuelvd
[EMAIL PROTECTED]


-
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more

Re: Regarding Setup Lucine for my site

2003-03-05 Thread Pinky Iyer

Thanks, for the info, even I would be intrested to see the zip code esplly for 
indexer. This discussion has been a wonderful source of info  esplly for we starters. 
Thanks to one and all. I guess once in a while such a discussion helps us too , to get 
to the level usually the discussion is!
I would appreciate if anybody could tell me the documentation which was mentioned 
earliar which sheds light on the complete understanding of lucene.
Thanks again!
 Catalin <[EMAIL PROTECTED]> wrote:hi there !
we have almost the same configuration (site, index, paths, etc) like you.
we used for our search on the site another approach.

eg: use a small crawler to index some feeded urls,
make the lucene index, make the web search app to use that index.

for crawling:
http://cvs.cabanova.ro/viewcvs.cgi/indexer/

for webapp:
http://cvs.cabanova.ro/viewcvs.cgi/wsearch/

running online:
http://www.anet.ro/search?query=star+wars

the code of the indexer is based on i2a websearch application demo
that is listed on lucene jakarta site.

take a look, maybe you might find something usefull !
there is no .zip available for download.
but if somebody requests the .zip
we can put it online.

have fun !

Catalin

- Original Message - 
From: Samuel Alfonso Velázquez Díaz 
To: Lucene Users List 
Sent: Wednesday, March 05, 2003 3:16 AM
Subject: Re: Regarding Setup Lucine for my site



Yes I have
1.- The directory with the files to index:
C:/filesToIndex/www/

2.- A path where the index files from the search engine will be created, lets say
C:/index/
3.- I have an internet domain whose name is: www.mysite.com
4.- A web application context that runs at http://www.mysite.com/search

Once I have set all the above things I want to be able to use the search aplication:
http://www.mysite.com/search/search.jsp
And I dont want that the results that I get from the index (step 2) give me results 
like
Your file is at
C:/filesToIndex/www/some_html/my_doc.html
The results should be:
Your file is at
http://www.mysite.com/some_html/my_doc.html
For the comments I have read (THANK YOU VERY MUTCH) I conclude that there is no way to 
generate the index with some custom prefix (as http://www.mysite.com/ for the 
documents at C:/filesToIndex/www/).
It seems that I have to modify my web application 
(http://www.mysite.com/search/search.jsp) to include some logic to repalce 
"C:/filesToIndex/www/" to "http://www.mysite.com/";.
If you could point me to the source code of lucene to include this logic and this way 
fix it once and for all, will appreciate a lot.
The command I used to generate this index was:
java org.apache.lucene.demo.IndexHTML -create -index index C:\index C:\filesToIndex\ 
www\
Now in the web application I have to modify 
IndexSearcher searcher;
Query query; 
Hits hits; 

// some code after...
hits = searcher.search(query); 

for ( /* search through the hit list*/)

Document doc = hits.doc(i); 
String doctitle = doc.get("title");
String url = doc.get("url"); 

I have to do some thing like url = "http://www.mysite.com/"; + 
url.substring("C:/filesToIndex/www/".length);

Regards!!!
And thanks again
Pinky Iyer 
wrote:
I dont understand the explanantion. When I try and index the documents as mentioned in 
the examples, and then when i run the app and do a sample search, it does point to the 
directory structure say "c:/filesToIndex/www/" instead of 
"http://localhost:8080/www/";. So how can this be changed to reflect the website domain 
as mentioned by you. Could you explain again. Say my docs are under a directory 
c:/filesToIndex/www/ and the wesite is as you said http://localhost:8080/ , then how 
to proceed!
Thanks in advance!
Samuel Alfonso Velázquez Díaz wrote:
Oh ok, I thougth it was going to be some thing like the egothor search engine (A java 
based search engine). When you create the Index, you issue a command like:
java org.egothor.indexer.mirror.DoTanker /tmp/my_www Project/Egothor/var/www as 
http://localhost:8080
/thmp/my_www: Is the path to the directory where the index is to be created
Project/Egothor/var/www: is the path to the local file system files to be indexed.
and as http://localhost:8080 is the prefix that the index will keep on the hit list. 
This way the index will be relative to http://localhost:8080. Even if your production 
site may be an other site.
Thanks for your comments, any way now I know that I have to modify code to do this.
Regards!
Jeff Linwood wrote:Hi,

I'm not a hundred percent sure I understand what you are asking, but when
you get the results back from Lucene (the hits) it's up to you to format
them to display on a web page - you can always do the modification there
when you display the links to the results.

Jeff
- Original Message -
From: "Samuel Alfonso Velázquez Díaz" 
To: "Lucene Users List" 
Sent: Tuesday, March 04, 2003 11:33 AM
Subject: Regarding Setup Lucine 

Re: Regarding Setup Lucine for my site

2003-03-05 Thread maurits van wijland
Catalin,
could you send me a zip file with your implementation?

Thanks,

maurits
- Original Message -
From: "Catalin" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, March 05, 2003 10:26 AM
Subject: Re: Regarding Setup Lucine for my site


hi there !
we have almost the same configuration (site, index, paths, etc) like you.
we used for our search on the site another approach.

eg: use a small crawler to index some feeded urls,
make the lucene index, make the web search app to use that index.

for crawling:
http://cvs.cabanova.ro/viewcvs.cgi/indexer/

for webapp:
http://cvs.cabanova.ro/viewcvs.cgi/wsearch/

running online:
http://www.anet.ro/search?query=star+wars

the code of the indexer is based on i2a websearch application demo
that is listed on lucene jakarta site.

take a look, maybe you might find something usefull !
there is no .zip available for download.
but if somebody requests the .zip
we can put it online.

have fun !

Catalin

  - Original Message -
  From: Samuel Alfonso Velázquez Díaz
  To: Lucene Users List
  Sent: Wednesday, March 05, 2003 3:16 AM
  Subject: Re: Regarding Setup Lucine for my site



  Yes I have
  1.- The directory with the files to index:
  C:/filesToIndex/www/

  2.- A path where the index files from the search engine will be created,
lets say
  C:/index/
  3.- I have an internet domain whose name is: www.mysite.com
  4.- A web application context that runs at http://www.mysite.com/search

  Once I have set all the above things I want to be able to use the search
aplication:
  http://www.mysite.com/search/search.jsp
  And I dont want that the results that I get from the index (step 2) give
me results like
  Your file is at
  C:/filesToIndex/www/some_html/my_doc.html
  The results should be:
  Your file is at
  http://www.mysite.com/some_html/my_doc.html
  For the comments I have read (THANK YOU VERY MUTCH) I conclude that there
is no way to generate the index with some custom prefix (as
http://www.mysite.com/ for the documents at C:/filesToIndex/www/).
  It seems that I have to modify my web application
(http://www.mysite.com/search/search.jsp) to include some logic to repalce
"C:/filesToIndex/www/" to "http://www.mysite.com/";.
  If you could point me to the source code of lucene to include this logic
and this way fix it once and for all, will appreciate a lot.
  The command I used to generate this index was:
  java org.apache.lucene.demo.IndexHTML -create -index index C:\index
C:\filesToIndex\ www\
  Now in the web application I have to modify
IndexSearcher searcher;
Query query;
Hits hits;

// some code after...
   hits = searcher.search(query);

for ( /* search through the hit list*/)

Document doc = hits.doc(i);
String doctitle = doc.get("title");
String url = doc.get("url");

  I have to do some thing like url = "http://www.mysite.com/"; +
url.substring("C:/filesToIndex/www/".length);

  Regards!!!
  And thanks again
   Pinky Iyer <[EMAIL PROTECTED]> wrote:
  I dont understand the explanantion. When I try and index the documents as
mentioned in the examples, and then when i run the app and do a sample
search, it does point to the directory structure say "c:/filesToIndex/www/"
instead of "http://localhost:8080/www/";. So how can this be changed to
reflect the website domain as mentioned by you. Could you explain again. Say
my docs are under a directory c:/filesToIndex/www/ and the wesite is as you
said http://localhost:8080/ , then how to proceed!
  Thanks in advance!
  Samuel Alfonso Velázquez Díaz wrote:
  Oh ok, I thougth it was going to be some thing like the egothor search
engine (A java based search engine). When you create the Index, you issue a
command like:
  java org.egothor.indexer.mirror.DoTanker /tmp/my_www
Project/Egothor/var/www as http://localhost:8080
  /thmp/my_www: Is the path to the directory where the index is to be
created
  Project/Egothor/var/www: is the path to the local file system files to be
indexed.
  and as http://localhost:8080 is the prefix that the index will keep on the
hit list. This way the index will be relative to http://localhost:8080. Even
if your production site may be an other site.
  Thanks for your comments, any way now I know that I have to modify code to
do this.
  Regards!
  Jeff Linwood wrote:Hi,

  I'm not a hundred percent sure I understand what you are asking, but when
  you get the results back from Lucene (the hits) it's up to you to format
  them to display on a web page - you can always do the modification there
  when you display the links to the results.

  Jeff
  - Original Message -
  From: "Samuel Alfonso Velázquez Díaz"
  To: "Lucene Users List"
  Sent: Tuesday, March 04, 2003 11:33 AM
  Subject: Regarding Setup Lucine for my 

Re: Regarding Setup Lucine for my site

2003-03-05 Thread Catalin
hi there all !
the .zip is available (by request) 
at: 
http://dev.cabanova.ro/java/lucene/

have fun !

Catalin

  - Original Message - 
  From: maurits van wijland 
  To: Lucene Users List 
  Sent: Wednesday, March 05, 2003 6:17 PM
  Subject: Re: Regarding Setup Lucine for my site


  Catalin,
  could you send me a zip file with your implementation?

  Thanks,

  maurits
  - Original Message -
  From: "Catalin" <[EMAIL PROTECTED]>
  To: "Lucene Users List" <[EMAIL PROTECTED]>
  Sent: Wednesday, March 05, 2003 10:26 AM
  Subject: Re: Regarding Setup Lucine for my site


  hi there !
  we have almost the same configuration (site, index, paths, etc) like you.
  we used for our search on the site another approach.

  eg: use a small crawler to index some feeded urls,
  make the lucene index, make the web search app to use that index.

  for crawling:
  http://cvs.cabanova.ro/viewcvs.cgi/indexer/

  for webapp:
  http://cvs.cabanova.ro/viewcvs.cgi/wsearch/

  running online:
  http://www.anet.ro/search?query=star+wars

  the code of the indexer is based on i2a websearch application demo
  that is listed on lucene jakarta site.

  take a look, maybe you might find something usefull !
  there is no .zip available for download.
  but if somebody requests the .zip
  we can put it online.

  have fun !

  Catalin

- Original Message -
From: Samuel Alfonso Velázquez Díaz
To: Lucene Users List
Sent: Wednesday, March 05, 2003 3:16 AM
    Subject: Re: Regarding Setup Lucine for my site



Yes I have
1.- The directory with the files to index:
C:/filesToIndex/www/

2.- A path where the index files from the search engine will be created,
  lets say
C:/index/
3.- I have an internet domain whose name is: www.mysite.com
4.- A web application context that runs at http://www.mysite.com/search

Once I have set all the above things I want to be able to use the search
  aplication:
http://www.mysite.com/search/search.jsp
And I dont want that the results that I get from the index (step 2) give
  me results like
Your file is at
C:/filesToIndex/www/some_html/my_doc.html
The results should be:
Your file is at
http://www.mysite.com/some_html/my_doc.html
For the comments I have read (THANK YOU VERY MUTCH) I conclude that there
  is no way to generate the index with some custom prefix (as
  http://www.mysite.com/ for the documents at C:/filesToIndex/www/).
It seems that I have to modify my web application
  (http://www.mysite.com/search/search.jsp) to include some logic to repalce
  "C:/filesToIndex/www/" to "http://www.mysite.com/";.
If you could point me to the source code of lucene to include this logic
  and this way fix it once and for all, will appreciate a lot.
The command I used to generate this index was:
java org.apache.lucene.demo.IndexHTML -create -index index C:\index
  C:\filesToIndex\ www\
Now in the web application I have to modify
  IndexSearcher searcher;
  Query query;
  Hits hits;

  // some code after...
 hits = searcher.search(query);

  for ( /* search through the hit list*/)

  Document doc = hits.doc(i);
  String doctitle = doc.get("title");
  String url = doc.get("url");

I have to do some thing like url = "http://www.mysite.com/"; +
  url.substring("C:/filesToIndex/www/".length);

Regards!!!
And thanks again
 Pinky Iyer <[EMAIL PROTECTED]> wrote:
I dont understand the explanantion. When I try and index the documents as
  mentioned in the examples, and then when i run the app and do a sample
  search, it does point to the directory structure say "c:/filesToIndex/www/"
  instead of "http://localhost:8080/www/";. So how can this be changed to
  reflect the website domain as mentioned by you. Could you explain again. Say
  my docs are under a directory c:/filesToIndex/www/ and the wesite is as you
  said http://localhost:8080/ , then how to proceed!
Thanks in advance!
Samuel Alfonso Velázquez Díaz wrote:
Oh ok, I thougth it was going to be some thing like the egothor search
  engine (A java based search engine). When you create the Index, you issue a
  command like:
java org.egothor.indexer.mirror.DoTanker /tmp/my_www
  Project/Egothor/var/www as http://localhost:8080
/thmp/my_www: Is the path to the directory where the index is to be
  created
Project/Egothor/var/www: is the path to the local file system files to be
  indexed.
and as http://localhost:8080 is the prefix that the index will keep on the
  hit list. This way the index will be relative to http://localhost:8080. Even
  if your production site may be an other site.
Thanks for your comments, any way now I know that I have to modify code to
  do this.
Regards!
Jeff Linwood

Re: Regarding Setup Lucine for my site

2003-03-05 Thread Velázquez
.DoTanker /tmp/my_www
> Project/Egothor/var/www as http://localhost:8080
> /thmp/my_www: Is the path to the directory where the index is to be
> created
> Project/Egothor/var/www: is the path to the local file system files
> to be indexed.
> and as http://localhost:8080 is the prefix that the index will keep
> on the hit list. This way the index will be relative to
> http://localhost:8080. Even if your production site may be an other
> site.
> Thanks for your comments, any way now I know that I have to modify
> code to do this.
> Regards!
> Jeff Linwood wrote:Hi,
> 
> I'm not a hundred percent sure I understand what you are asking, but
> when
> you get the results back from Lucene (the hits) it's up to you to
> format
> them to display on a web page - you can always do the modification
> there
> when you display the links to the results.
> 
> Jeff
> - Original Message -
> From: "Samuel Alfonso Velázquez Díaz" 
> To: "Lucene Users List" 
> Sent: Tuesday, March 04, 2003 11:33 AM
> Subject: Regarding Setup Lucine for my site
> 
> 
> >
> > The documentation says:
> >
> > Once you've gotten this far you're probably itching to go. Let's
> start by
> creating the index you'll need for the web examples. Since you've
> already
> set your classpath in the previous examples, all you need to do is
> type
> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir}
> ..".
> You'll need to do this from a (any) subdirectory of your
> {tomcat}/webapps
> directory (make sure you didn't leave off the ".." or you'll get a
> null
> pointer exception). {index-dir} should be a directory that Tomcat has
> permission to read and write, but is outside of a web accessible
> context. By
> default the webapp is configured to look in /opt/lucene/index for
> this
> index.
> >
> > A copy of my site is in:
> >
> > C:\CopiaSite20030228\
> >
> > My web application runs on
> >
> > http://mydomain.com/search/index.jsp
> >
> > how can I make the lucene index map the URLs of the indexed files
> to:
> >
> > http://mydomain.com/
> >
> >
> >
> > Please help!
> >
> >
> > Samuel Alfonso Velázquez Díaz
> > http://www.geocities.com/samuelvd
> > [EMAIL PROTECTED]
> >
> >
> > -
> > Do you Yahoo!?
> > Yahoo! Tax Center - forms, calculators, tips, and more
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> Samuel Alfonso Velázquez Díaz
> http://www.geocities.com/samuelvd
> [EMAIL PROTECTED]
> 
> 
> -
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more
> 
> 
> -
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more
> 
> Samuel Alfonso Velázquez Díaz
> http://www.geocities.com/samuelvd
> [EMAIL PROTECTED]
> 
> 
> -
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more


__
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Samuel Alfonso Velázquez Díaz
http://www.geocities.com/samuelvd
[EMAIL PROTECTED]


-
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more

Re: Regarding Setup Lucine for my site

2003-03-05 Thread Leo Galambos
On Tue, 4 Mar 2003, Otis Gospodnetic wrote:

> Even if you could replace C:\. with http:// it wouldn't be a
> good solution, as directory structures and file paths do not always map
> directly to URLs.

Yes, but it is not the case of Samuel's configuration and 99.99% of 
others.

The fact is, that Lucene is only a library, and sandbox utilities which
are of different quality. :-)

-g-



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Regarding Setup Lucine for my site

2003-03-05 Thread Leo Galambos
> org.apache.lucene.demo.IndexHTML wich was provided with the
> documentation. Is there any problem using this demo class for a web
> production site? I'm an application developer and it would be hard to
> understand the hole lucene code to use it. It would be almost imposible

You can use it, but: if you need something special (snippets, coloring,
different URL mapping, handling of your local charset, etc. etc.) you must
include code from sandbox or write it from scratch AFAIK.

> for my develop phase timings to try to do this. * Regarding you comment:
> Lucene does not index web pages. I thougth lucene main goal was to index
> web pages ¿? and as an after thougth it should be able to index text
> files or some other information (for example mail databases). Regards

Lucene *can* index HTML pages, if you use programs which build Lucene 
index from HTML documents. The programs exist.

On the other hand, if you extend Lucene with your hacks, you will find out
that the model of Lucene is unknown and many parts are hard-coded. It
boosts speed, but it disallows future enhancements (I could name the
parts, I hope we do not start flamewar here).

> and thanks for your comments!!! I'm considering egothor search
> engine. I succesfully set a web application for searching my web site
> but I didn't see a mailing list or a forum with the level of

I had PhD exam, and many questions went throught ICQ, you know, it is 
faster for me than e-mails...

-g-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Regarding Setup Lucine for my site

2003-03-05 Thread Otis Gospodnetic
> On the other hand, if you extend Lucene with your hacks, you will
> find out
> that the model of Lucene is unknown and many parts are hard-coded. It
> boosts speed, but it disallows future enhancements (I could name the
> parts, I hope we do not start flamewar here).

I'm all eyes and I'm a serious grown-up with good manners :)
Constructive suggestions for improvement are always welcome.

Thanks,
Otis


__
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Regarding Setup Lucine for my site

2003-03-05 Thread Leo Galambos
> > On the other hand, if you extend Lucene with your hacks, you will
> > find out
> > that the model of Lucene is unknown and many parts are hard-coded. It
> > boosts speed, but it disallows future enhancements (I could name the
> > parts, I hope we do not start flamewar here).
> 
> I'm all eyes and I'm a serious grown-up with good manners :)
> Constructive suggestions for improvement are always welcome.

1. 2 threads per request may improve speed up to 50%

2. Merger is hard coded

3. you cannot use different inverted lists in one index (i.e. pagerank and
doc_id instead of doc_id/prox_handle/freq/...), inverted lists do not
support multilevel skips (see MoZo papers about this topic)

4. you cannot implement dissemination + wrappers for internet servers 
which would serve as static barrels.

5. Document metadata cannot be stored as a programmer wants, he must
translate the object to a set of fields

6. Lucene cannot implement your own dynamization

etc.

-g-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Regarding Setup Lucine for my site

2003-03-05 Thread Tatu Saloranta
On Wednesday 05 March 2003 13:35, Leo Galambos wrote:
> > I'm all eyes and I'm a serious grown-up with good manners :)
> > Constructive suggestions for improvement are always welcome.
>

First a disclaimer: I don't mean to sound too negative. I'm genuinely curious 
about many of the issues you mention. But I'm not sure I really understand 
them. :-)

> 1. 2 threads per request may improve speed up to 50%

Hmm? Could you clarify? During indexing, multithreading may speed things
up (splitting docs to index in 2 or more sets, indexing separately, combining
indexing). But... isn't that a good thing? Or are you saying that it'd be good 
to have multi-threaded search functionality for single search? (in my 
experience searching is seldom the slow part)

> 2. Merger is hard coded

In a way that is bad because... ?
(ie. what is the specific problem... I assume you mean index merging
functionality?)

...
> 4. you cannot implement dissemination + wrappers for internet servers
> which would serve as static barrels.

Could you explain this bit more thoroughly (or pointers on longer 
explanation)?

> 5. Document metadata cannot be stored as a programmer wants, he must
> translate the object to a set of fields

Yes? I'd think that possibility of doing separate fields is a good thing; 
after all, all a plain text search engine needs to provide (to be considered 
one) is indexing of plain text data, right?
Plus, Lucene is not a Content Management System (or database), but
content indexing system. As such I'm not sure why storage should not be 
optimized to allow for fast searches (which means flattening contents, 
amongst other things).

That is not to say that things couldn't be improved; it might be a good idea 
to define small set of base interfaces / classes to make it easier to convert 
from 'objectified' textual data to straight-forward indexing.

FWIW I am actually using Lucene for storing documents that have extensive 
metadata associated, and I don't find restrictions too bad... but that's 
certainly matter of taste. :-)

> 6. Lucene cannot implement your own dynamization

(sorry, I must sound real thick here).
Could you elaborate on this... what do you mean by dynamization?

-+ Tatu +-



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Regarding Setup Lucine for my site

2003-03-06 Thread Velázquez

For all beginers (as I can tell), I found this URL and I thougth you may want to check 
it out:
http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html
Regards!


Samuel Alfonso Velázquez Díaz
http://www.geocities.com/samuelvd
[EMAIL PROTECTED]


-
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more

Re: Regarding Setup Lucine for my site

2003-03-06 Thread Leo Galambos
> > 1. 2 threads per request may improve speed up to 50%
> Hmm? Could you clarify? During indexing, multithreading may speed things
> up (splitting docs to index in 2 or more sets, indexing separately, combining
> indexing). But... isn't that a good thing? Or are you saying that it'd be good 
> to have multi-threaded search functionality for single search? (in my 
> experience searching is seldom the slow part)

you may improve indexing and searching. Indexing, because the merge
operation will lock just one thread and smaller part of an index while
other threads are still working;  searching, because you can distribute
the query to more barrels. In both cases you save up to 50% of time (I
assume mergefactor=2).

> > 2. Merger is hard coded
> 
> In a way that is bad because... ?
> (ie. what is the specific problem... I assume you mean index merging
> functionality?)

Because you cannot process local and/or remote barrels -- all must be
local in Lucene object model. That is the serious bug IMHO.

> > 4. you cannot implement dissemination + wrappers for internet servers
> > which would serve as static barrels.
> Could you explain this bit more thoroughly (or pointers on longer 
> explanation)?

Read more about dissemination, metasearch engines (i.e. Savvysearch),
dDIRs (i.e. Harvest). BTW, let's go to a pub and we can talk til morning
:) (it is a serious offer, because I would like to know more about IR).

This example is about metasearch (the simplest case of dDIRs): Can Lucene
allow that a barrel (index segment?) is static and a query is solved via
wrapper, that sends the query ${QUERY} to
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=${QUERY} and then
reads the HTML output as a result?

> > 5. Document metadata cannot be stored as a programmer wants, he must
> > translate the object to a set of fields
> Yes? I'd think that possibility of doing separate fields is a good thing; 
> after all, all a plain text search engine needs to provide (to be considered 
> one) is indexing of plain text data, right?

I talked about metadata. When metadata object knows how to achieve its 
persistence, why would one translate anything to fields and then back?
Why would you touch the users metadata at all? You need flat fields for
indexing, and what's around -- it is not your problem :). Lucene is
something between CMS and CIS, you say that it's closer to CIS, but when
you need metadata in fields, you are closer to CMS IMHO.

> > 6. Lucene cannot implement your own dynamization
> 
> (sorry, I must sound real thick here).
> Could you elaborate on this... what do you mean by dynamization?

Read more about "Dynamization of Decomposable Searching Problems".

-g-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]