Hi George,
Im answering inline:
Sorry, but I am not understanding what's going on here. What modification
you are referring to that "Elena" and you made? Was there some private
email exchange?
You see Elena's suggestions in
lucene-net-dev Digest of: get.163
I answered 8 days after that poining out a FieldDocSortedHitQueue issue in non
english environments and a fix for that, which was also the start of this
thread.
I also added some lines of VB.NET Code to test the suggestions there.
To keep it easy for you to check this again, I attached both messages here,
since scrolling down to the bottom of this message makes it nearly unreadable
through the uncounted quotes and indentations here.
In any case, one other option there is to provide remote searching with
Lucene.Net is to port the existing solution in 1.4 to 2.0 (or maybe even
1.9.1) If you or some has the cycles and want to take on this task, let us
know and go for it.
Sorting works with MultiSearcher. Make sure you are using the latest
release of 1.9.1 or 2.0 ("final" in both cases.)
I have the latest 1.9.1 here but it does not work for me. I will check
this again and provide a sample showing this issue in the case I can't
find any mistake I have done here.
I can't tell you much about Lucene.Net and WAN since I have not used it (I
don't have a need for it, yet.) Since you say you have written a solution,
and it sounds like a good one, can you contribute it to ASP / Lucene.Net?
If you can do so, make sure you have the appropriate ASF copyright message
on each file, a README.TXT file, a sample / demo and if possible an NUnit
test for the code.
As mentioned in my message before, I can do this. Currently this code is
in an active production test, covering 8 huge fileservers, about 100
indices and about 2TB of indexed data, which are located in Europe, Asia
and the US and connected via VPN-Tunnels.
I expect that I have to optimize and fix some stuff during the test
phase which is scheduled to run until end of february.
After the tests are finished and the framework is to be considered as
stable I will find the time to provide a reusable solution with some
samples.
Regards
Robert
Regards,
-- George Aroush
-----Original Message-----
From: Robert Boulanger [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 02, 2007 6:19 PM
To: [email protected]
Subject: Re: Remote searching with Lucene - forward progress
Hi Jeff,
thanks for the update.
Here the status from my side so far:
I worked until I dropped the last message sucessfully with the modifications
Elena and I described before. I did nothing else since I waited and hoped
for any other progress from other sides, but wondered why the suggested
fixes never went into the releases of 1.9.
Anyhow, an other issue I found is that the sorting seems not to work
correctly when using remote searching features. (And maybe when using
MultiSearcher in general) So it looks like each index is sorted, but not the
hits collection of the multisearcher itself.
But the major issue I found was, that remote searches over a WAN, means
Inernet or a VPN for example takes about 100 time so long as the same query
within a LAN. ( means 7 seconds instead of 0.07 secs). So I think the Lucene
Remote Query relays on heavy bidirectional Network Traffic, means not
transporting a lot of data, but a lot of single calls which makes it slow in
a WAN Environment.
Therefore I wrote my own Client Server Wrapper for this which does things in
a single call to each remote index, and which is possible now also again
with Lucene 1.3 if necessary.
I'm also able to do this in a cascading way, means each queryserver can be
configured to forward the query to other servers and they again, and so on,
and so on. hereby is ensured that endless loops are not possible (Server a
calls b which calls again a) and the API allows the passing of a parameter
which defines how deep (in the hierarchy of configured
servers) the search should be forwarded. The end result again has correct
sorting. I also don't use any multisearchers here, just normal indexreaders.
The whole architecture has nothing to do with Lucene itself, except the fact
that Lucene is used for searching, but if anybody has interest in this, let
me know, I can build a template or example how to do this and post it
anywhere.
Cheers
Robert
Jeff Rodenburg schrieb:
Hi Robert, et. al -
No, I've not missed updating the list. I've been a bit busy with
other things but have been working to resolve some serialization
issues that are down in the core of .Net Remoting. The Lucene 2.0
codebase has been problematic inside of the remoting architecture.
Rather than continue to update the list with notifications about a
lack of progress, I've opted to attempt to address those issues and
make an announcement when I'd reached success.
So, no news for now.
thanks,
jeff
On 12/3/06, Robert Boulanger <[EMAIL PROTECTED]> wrote:
Hi Jeff,
concerning the message thread below which I began in August this
year, I wonder if there is any progress on your side so far.
Maybe I missed something in the mailinglist (what I expect), since I
was busy with other stuff, but the last note from you concerning
remote search I find here was from september 13th.
So, since I'm on this topic again, I just want to know, whether you
released anything in the past months what I'm just not seeing or if
you are still on the issue you are describing in your last note.
thanks for replying
best regards
--Robert
Jeff Rodenburg schrieb:
An update on the Remote Searching project I'm bringing forward.
I've completed the base code for hand-off to the community. I'm
presently working through a remoting/serialization issue that's
popped up
recently.
This appears to be something new in the Lucene 2.0 release. I'm
working
through that issue now, but I haven no expectation of when that's
resolved.
Rather than release a non-working system, I'm going to resolve this
problem first. Once things are working appropriately, I'll send
out a release message.
Thanks and if you have remoting experience and suggestions, feel
free to
ping me. :-)
cheers,
jeff r.
On 9/7/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
All -
Another update on the remote searching application code that's
been mentioned in this thread. I'm near completion of the entire
collection of files that are needed for this project -- libraries,
applications,
unit
tests, and documentation. There's quite a bit to this, and thanks
for
everybody's patience as I assemble the code into something that's
less than confusing. There are several working pieces, so I'm
packaging it for consumption.
I expect to have this available sometime in the next few days,
barring
things like my life and regular job from getting in the way.
Again, I'll share an announcement to the list when I've made the
files available.
Thanks,
jeff r.
On 8/26/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
As promised, an update to the list.
I have code ready for delivery, if I can get svn access to the
contrib
section. A request has been made for this but it's going
nowhere,
so I'm
going to find another place to host the files.
There's quite a bit of documentation behind this so I'm working
diligently to explain how this works. If anyone has a place to
hold the
code until the uber-powers at apache decide to grant me access,
we
would
greatly appreciate the assistance.
cheers,
jeff r.
On 8/23/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote:
Just a follow-up to everyone on this topic. I received a lot
of offlist mail about this, so this message has a rather wide
distribution.
I'm in process of modifying the code for our distributed
search components so that they're generic enough for general
usage and
public
consumption. This is taking a little of my time, but
nonetheless
I expect
to complete it soon.
As for distributing the code, it will be located in the
contrib portion of the Lucene.Net repository at apache.org .
There is
some
logistic work involved, but ideally this is moving forward.
As soon as I have more information to relay, I'll pass it
along
to the
list.
cheers,
jeff r.
On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote:
Hello all -
I've been watching this thread to follow the direction and
thought I
might be able to offer some assistance. I run a search
system
that involves
4 separate search servers -- 3 serving search objects via
RemoteSearchable,
and a 4th that serves in an index updating role.
The codebase for Lucene.Net provides all the library
routines one
needs to provide distributed search capabilities, but does
not
provide
facilities for distributed search operation -- nor should it.
The ideas
presented here are certainly possible; I've implemented a
working operation
without requiring the changes described here. I'm confident
in
our
implementation; for the calendar year, our
uptime/availability
of search
services is 99.99%. Our only outage was related to network
hardware, otherwise we're sitting solid at 100%.
I've been authorized to provide our operational code for
distributed
search under Lucene.Net to the community at large. Some of
the
code
is customized to our operation, but for the most part it's
rather generic.
We started the project under Lucene v1.4.3, but the
operational aspect still applies under v1.9.
The system consists of a LuceneServer, which provides
searchability
against indexes as defined in XML configuration files. In
addition, an
IndexUpdateServer provides master index updating,
master/slave
index
replication and automated index maintenance. Integration
with
our web site
ensures the index stays available, updated and current.
There's a great
deal of applied knowledge and learned behavior of many of
the
underlying
sub-system components that distributed search under
Lucene.Net
makes
use of -- .Net remoting, garbage collection, etc.
If anyone has interest, please reply. Contributing this
code requires a little cleanup of our customization work, so
my
response may not
be immediate but I would make efforts to release the code in
short order.
thanks,
jeff r.
On 8/19/06, Robert Boulanger < [EMAIL PROTECTED]> wrote:
Hi Elena, hi Rest,
Dear All,
The application I am working on is intended to make use
of
the
distributed search capabilities of the Lucene library.
While
trying to
work with the Lucene's RemoteSearchable class, I faced
some
problems
cased by the current Lucene implementation. In following
I'll
try to
describe them, as well as the possible ways of their
solution, I
identified. The most important question for me is, if
these
changes
have a chance to be integrated in the coming Lucene
versions,
such
that remote searches would really become feasible. I
would
appreciate
any feedback.
Same problem for me and I found some more issues which I
explain
below:
The first problem concerns the construction of the
RemoteSearchable
object. .Net framework allows for both, server and
client
activation
models of the remote objects. Currently,
RemoteSearchable
class
possesses only one constructor that requires knowledge
of a
local
Searchable object:
public RemoteSearchable(Lucene.Net.Search.Searchable
local)
I just added a new constructor to RemoteSearchable public
RemoteSearchable(): base() { this.local = this.local; }
not the fine method but for me it works so far.
Since this "local" object is located on the server,
knowledge of
the
server's index paths is needed for its creation.
However,
there
are at
least some scenarios where only the server, but not the
client,
knows
where the indexes are stored on the server side. I think
this
problem
could be solved by extending RemoteSearchable class with
a
standard
constructor that reads the names of the indexes to be
published
out of
a configuration file on the server side.
My "Server" now implements a Class which inherits directly
from
Remote
Searchable.
in the parameterless constructor there I read the server
sided
configfile which contains the index location , create a
new IndexReader and pass it as Argument to MyBase.New()
See sample below.
2. Bug in Term construction
[snip]
This whole chapter was very useful and I can commit
everything
works
fine from there on.
But there is still a bug in FieldDocSortedHitQueue line
130 and
below:
I figured out that the castings are not working when the
system is
running in a non english globalization context.
The String in docAFields[i] which might be for example
1.345678 is
casted to 1345678.0 since the decimal sign is
misinterpreted in
German
systems as it seems.
So the casting results in an overflow.
So I changed it as follows:
case SortField.SCORE:
float r1 = (float)Convert.ToSingle(docA.fields[i],
System.Globalization.NumberFormatInfo.InvariantInfo );
float r2 = (float)Convert.ToSingle(docA.fields[i],
System.Globalization.NumberFormatInfo.InvariantInfo);
if (r1 > r2)
c = - 1;
if (r1 < r2)
c = 1;
break;
Same in line 172 and 174:
float f1 = (float)Convert.ToSingle(docA.fields[i],
System.Globalization.NumberFormatInfo.InvariantInfo);
//UPGRADE_TODO: The equivalent in .NET for method
'java.lang.Float.floatValue' may return a different value.
"ms-help://MS.VSCC.v80/dv_commoner/local/redirect.htm?index='!DefaultContext
WindowIndex'&keyword='jlca1043'"
float f2 = (float)Convert.ToSingle(docB.fields[i],
System.Globalization.NumberFormatInfo.InvariantInfo );
A tiny Client Server Solution now looks like this (Here in
VB.NET)
SERVER:
Public Class RemoteQuery
Inherits RemoteSearchable
Public Sub New()
MyBase.New(New IndexSearcher("C:\lucene\index")) End Sub
Public Sub New(ByVal local As Searchable)
MyBase.New(local)
End Sub
End Class
Module Module1
Public Sub Main(ByVal args As System.String()) Dim chnl As
New HttpChannel(8888) ChannelServices.RegisterChannel
(chnl, False) Dim indexName As System.String = Nothing
RemotingConfiguration.RegisterWellKnownServiceType
(GetType(RemoteQuery),
"Searchable", WellKnownObjectMode.Singleton)
System.Console.ReadLine()
End Sub
End Module
CLIENT
Sub Main()
Dim searchables As Lucene.Net.Search.Searchable() = New
Lucene.Net.Search.Searchable() {LookupRemote()} Dim
searcher As Searcher = New MultiSearcher(searchables) Dim
sort As New Lucene.Net.Search.Sort
sort.SetSort(Lucene.Net.Search.SortField.FIELD_SCORE)
Dim query As Query = QueryParser.Parse("Harry", "body",
New
StandardAnalyzer())
Dim result As Hits = searcher.Search (query, sort) End Sub
Private Function LookupRemote() As
Lucene.Net.Search.Searchable
Return CType(Activator.GetObject(GetType(
Lucene.Net.Search.Searchable), "
http://192.168.8.7:8888/Searchable"),
Lucene.Net.Search.Searchable) End Function
Hope this helps you and anybody else how has problems with
remotesearch so far.
BTW: this all refer
--- Begin Message ---
Hi Elena, hi Rest,
Dear All,
The application I am working on is intended to make use of the
distributed search capabilities of the Lucene library. While trying to
work with the Lucene’s RemoteSearchable class, I faced some problems
cased by the current Lucene implementation. In following I’ll try to
describe them, as well as the possible ways of their solution, I
identified. The most important question for me is, if these changes
have a chance to be integrated in the coming Lucene versions, such
that remote searches would really become feasible. I would appreciate
any feedback.
Same problem for me and I found some more issues which I explain below:
The first problem concerns the construction of the RemoteSearchable
object. .Net framework allows for both, server and client activation
models of the remote objects. Currently, RemoteSearchable class
possesses only one constructor that requires knowledge of a local
Searchable object:
public RemoteSearchable(Lucene.Net.Search.Searchable local)
I just added a new constructor to RemoteSearchable
public RemoteSearchable(): base()
{
this.local = this.local;
}
not the fine method but for me it works so far.
Since this “local” object is located on the server, knowledge of the
server’s index paths is needed for its creation. However, there are at
least some scenarios where only the server, but not the client, knows
where the indexes are stored on the server side. I think this problem
could be solved by extending RemoteSearchable class with a standard
constructor that reads the names of the indexes to be published out of
a configuration file on the server side.
My "Server" now implements a Class which inherits directly from Remote
Searchable.
in the parameterless constructor there I read the server sided
configfile which contains the index location , create a new IndexReader
and pass it as Argument to MyBase.New()
See sample below.
2. Bug in Term construction
[snip]
This whole chapter was very useful and I can commit everything works
fine from there on.
But there is still a bug in FieldDocSortedHitQueue line 130 and below:
I figured out that the castings are not working when the system is
running in a non english globalization context.
The String in docAFields[i] which might be for example 1.345678 is
casted to 1345678.0 since the decimal sign is misinterpreted in German
systems as it seems.
So the casting results in an overflow.
So I changed it as follows:
case SortField.SCORE:
float r1 = (float)Convert.ToSingle(docA.fields[i],
System.Globalization.NumberFormatInfo.InvariantInfo);
float r2 = (float)Convert.ToSingle(docA.fields[i],
System.Globalization.NumberFormatInfo.InvariantInfo);
if (r1 > r2)
c = - 1;
if (r1 < r2)
c = 1;
break;
Same in line 172 and 174:
float f1 = (float)Convert.ToSingle(docA.fields[i],
System.Globalization.NumberFormatInfo.InvariantInfo);
//UPGRADE_TODO: The equivalent in .NET for method
'java.lang.Float.floatValue' may return a different value.
"ms-help://MS.VSCC.v80/dv_commoner/local/redirect.htm?index='!DefaultContextWindowIndex'&keyword='jlca1043'"
float f2 = (float)Convert.ToSingle(docB.fields[i],
System.Globalization.NumberFormatInfo.InvariantInfo);
A tiny Client Server Solution now looks like this (Here in VB.NET)
SERVER:
Public Class RemoteQuery
Inherits RemoteSearchable
Public Sub New()
MyBase.New(New IndexSearcher("C:\lucene\index"))
End Sub
Public Sub New(ByVal local As Searchable)
MyBase.New(local)
End Sub
End Class
Module Module1
Public Sub Main(ByVal args As System.String())
Dim chnl As New HttpChannel(8888)
ChannelServices.RegisterChannel(chnl, False)
Dim indexName As System.String = Nothing
RemotingConfiguration.RegisterWellKnownServiceType(GetType(RemoteQuery),
"Searchable", WellKnownObjectMode.Singleton)
System.Console.ReadLine()
End Sub
End Module
CLIENT
Sub Main()
Dim searchables As Lucene.Net.Search.Searchable() = New
Lucene.Net.Search.Searchable() {LookupRemote()}
Dim searcher As Searcher = New MultiSearcher(searchables)
Dim sort As New Lucene.Net.Search.Sort
sort.SetSort(Lucene.Net.Search.SortField.FIELD_SCORE)
Dim query As Query = QueryParser.Parse("Harry", "body", New
StandardAnalyzer())
Dim result As Hits = searcher.Search(query, sort)
End Sub
Private Function LookupRemote() As Lucene.Net.Search.Searchable
Return CType(Activator.GetObject(GetType(Lucene.Net.Search.Searchable),
"http://192.168.8.7:8888/Searchable"), Lucene.Net.Search.Searchable)
End Function
Hope this helps you and anybody else how has problems with remotesearch
so far.
BTW: this all refers Version 1.9rc1
--Robert Boulanger
--- End Message ---
--- Begin Message ---
lucene-net-dev Digest of: get.163
Topics (messages 163 through 163):
Remote searches with Lucene
163 by: Elena Demidova
Administrivia:
--- Administrative commands for the lucene-net-dev list ---
I can handle administrative requests automatically. Please
do not send them to the list address! Instead, send
your message to the correct command address:
To subscribe to the list, send a message to:
<[EMAIL PROTECTED]>
To remove your address from the list, send a message to:
<[EMAIL PROTECTED]>
Send mail to the following for info and FAQ for this list:
<[EMAIL PROTECTED]>
<[EMAIL PROTECTED]>
Similar addresses exist for the digest list:
<[EMAIL PROTECTED]>
<[EMAIL PROTECTED]>
To get messages 123 through 145 (a maximum of 100 per request), mail:
<[EMAIL PROTECTED]>
To get an index with subject and author for messages 123-456 , mail:
<[EMAIL PROTECTED]>
They are always returned as sets of 100, max 2000 per request,
so you'll actually get 100-499.
To receive all messages with the same subject as message 12345,
send an empty message to:
<[EMAIL PROTECTED]>
The messages do not really need to be empty, but I will ignore
their content. Only the ADDRESS you send to is important.
You can start a subscription for an alternate address,
for example "[EMAIL PROTECTED]", just add a hyphen and your
address (with '=' instead of '@') after the command word:
<[EMAIL PROTECTED]>
To stop subscription for this address, mail:
<[EMAIL PROTECTED]>
In both cases, I'll send a confirmation message to that address. When
you receive it, simply reply to it to complete your subscription.
If despite following these instructions, you do not get the
desired results, please contact my owner at
[EMAIL PROTECTED] Please be patient, my owner is a
lot slower than I am ;-)
--- Enclosed is a copy of the request I received.
Return-Path: <[EMAIL PROTECTED]>
Received: (qmail 32894 invoked by uid 99); 19 Aug 2006 17:01:44 -0000
Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49)
by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Aug 2006 10:01:44 -0700
X-ASF-Spam-Status: No, hits=0.0 required=10.0
tests=
X-Spam-Check-By: apache.org
Received-SPF: pass (asf.osuosl.org: local policy)
Received: from [80.123.113.50] (HELO boulanger.at) (80.123.113.50)
by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Aug 2006 10:01:43 -0700
Received: from [192.168.8.7] robert [192.168.8.7]
by boulanger.at with NetMail SMTP Agent $Revision: 1.7 $;
Sat, 19 Aug 2006 19:01:56 +0200
Message-ID: <[EMAIL PROTECTED]>
Date: Sat, 19 Aug 2006 19:00:58 +0200
From: Robert Boulanger <[EMAIL PROTECTED]>
User-Agent: Thunderbird 1.5.0.5 (Windows/20060719)
MIME-Version: 1.0
To: [EMAIL PROTECTED]
Subject: (kein Betreff)
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Checked: Checked by ClamAV on apache.org
----------------------------------------------------------------------
--- Begin Message ---
Dear All,
The application I am working on is intended to make use of the
distributed search capabilities of the Lucene library. While trying to
work with the Lucene’s RemoteSearchable class, I faced some problems
cased by the current Lucene implementation. In following I’ll try to
describe them, as well as the possible ways of their solution, I
identified. The most important question for me is, if these changes have
a chance to be integrated in the coming Lucene versions, such that
remote searches would really become feasible. I would appreciate any
feedback.
Best wishes,
Elena Demidova
Now to the problems themselves:
1. Architecture issue
The first problem concerns the construction of the RemoteSearchable
object. .Net framework allows for both, server and client activation
models of the remote objects. Currently, RemoteSearchable class
possesses only one constructor that requires knowledge of a local
Searchable object:
public RemoteSearchable(Lucene.Net.Search.Searchable local)
Since this “local” object is located on the server, knowledge of the
server’s index paths is needed for its creation. However, there are at
least some scenarios where only the server, but not the client, knows
where the indexes are stored on the server side. I think this problem
could be solved by extending RemoteSearchable class with a standard
constructor that reads the names of the indexes to be published out of a
configuration file on the server side.
2. Bug in Term construction
Another problem occurs as you try to perform a function call of a
RemoteSearchable object. The only function which really works correctly
is the MaxDoc() function. If you ask, for instance, for the document
frequency using DocFreq(new Term(“field”,”value”)), you’ll always get
“0” out of it. The reason for that is that all values, that are passed
as arguments (and return values) for the remote calls need to be
correctly serialized. For DocFreq function this argument is the Term
object, which can not be correctly reconstructed on the server side. The
constructor of the Term object performs additional “intern”-operation on
the field names, which is not called during the default serialization.
Thus the field names contained in the reconstructed Term object are not
comparable with those in the index.
This problem can be solved by overloading of the serialization procedure
for the objects of the Term class. In order to do that, Term class
should be derived from the ISerializable interface and overload its
serialization function "GetObjectData". The class itself need to store
the “intern” value passed to its constructor, since this knowledge is
required for the correct reconstruction of the object. Function
GetObjectData describes then how the object is serialized. Additional
deserialization constructor allows then for the correct reconstruction
of the object. The both operations are called automatically during the
remote call execution. In following the necessary code changes in the
Term class are presented:
//add derivation from the ISerializable interface
[Serializable()]
public sealed class Term : System.IComparable, ISerializable
…
//store the object’s intern value needed by the constructor
private bool intern;
internal Term(System.String fld, System.String txt, bool intern)
{
…
//store the object’s intern value
this.intern=intern;
}
//Serialization function
public void GetObjectData(SerializationInfo info, StreamingContext context)
{
info.AddValue("field", field);
info.AddValue("text", text);
info.AddValue("intern",intern);
}
//Deserialization constructor.
public Term(SerializationInfo info, StreamingContext ctxt)
{
String fld=(String)info.GetValue("field", typeof(String));
this.intern=(bool)info.GetValue("intern", typeof(bool));
this.field = intern ? String.Intern(fld) : fld;
this.text = (String)info.GetValue("text", typeof(String));
}
--- End Message ---
--- End Message ---