Re: [hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal

2008-04-28 Thread Emmanuel Bernard


On  Apr 28, 2008, at 15:03, Sanne Grinovero wrote:


Hello,
Nice, I'm going to work on this as soon as my previous patch gets  
applied.


I've some more very minor issues to add to the new patch if you agree:

service thread is started by:
timer.scheduleAtFixedRate( task, period, period );

I would think the first operation should start immediately, not  
after period.


If you look at the code, the first operation is done synchronously in  
start() we can't strat HSearch until we have a working Directory ready.




Also I would use schedule( task, long, long) instead of  
scheduleAtFixedRate;
the "atFixedRate" version means it will begin sooner if it started  
too late,

we don't require this (correct?).
So I propose:
timer.schedule( task, 0, period );


Well I kinda like scheduleAtFixedRate for the copy tasks, it sounds  
more admin friendly to me. Why do you like schedule better?





Also the timer creates another thread, running concurrently to itself
and skipping task if the previous is still working.
What would I miss if I just let the Timer do the job, without needing
the Executor around? (one less thread and less code to check
current activity as the timer supports that check for us).


In the current scheme, the copy is run every n minutes unless the  
previous tasks takes more than n minutes. In the latter, the task is  
just skipped.
If you don't, then the tasks will pile up and the copy will run over  
and over (while the tasks try to catch up)





Using the executor it would be nice to share the same
executor among several Directory Providers, to have better
resource bounds (serialize the copy for different instances
of FSSlave/Master Providers).
Would a static Executor instance be ugly?


Yes it would be ugly and would probably complicate hot redeployment.  
What would be the benefit really? I don't like very much the idea of  
one copy blocking all other copies. Plus there are some deadlocks  
issues between directories that HSearch deal with internally that  
might interfere as well if you share the same resource pool.


Threads are cheap.




regards,
Sanne



2008/4/28 Emmanuel Bernard <[EMAIL PROTECTED]>:

One JIRA / patch is good
Self-conflict is not always bad "When fight begins within himself,  
a man's

worth something." - Sir Frederick Browning :)



On  Apr 27, 2008, at 08:03, Sanne Grinovero wrote:



I would like to address A and B first; may I
open a single JIRA for both? both issues are related to the same  
files

and I would
like to avoid having to produce two different patches, potentially

conflicting

with myself.






___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal

2008-04-28 Thread Sanne Grinovero
Hello,
Nice, I'm going to work on this as soon as my previous patch gets applied.

I've some more very minor issues to add to the new patch if you agree:

service thread is started by:
timer.scheduleAtFixedRate( task, period, period );

I would think the first operation should start immediately, not after period.
Also I would use schedule( task, long, long) instead of scheduleAtFixedRate;
the "atFixedRate" version means it will begin sooner if it started too late,
we don't require this (correct?).
So I propose:
timer.schedule( task, 0, period );

Also the timer creates another thread, running concurrently to itself
and skipping task if the previous is still working.
What would I miss if I just let the Timer do the job, without needing
the Executor around? (one less thread and less code to check
current activity as the timer supports that check for us).

Using the executor it would be nice to share the same
executor among several Directory Providers, to have better
resource bounds (serialize the copy for different instances
of FSSlave/Master Providers).
Would a static Executor instance be ugly?

regards,
Sanne



2008/4/28 Emmanuel Bernard <[EMAIL PROTECTED]>:
> One JIRA / patch is good
>  Self-conflict is not always bad "When fight begins within himself, a man's
> worth something." - Sir Frederick Browning :)
>
>
>
>  On  Apr 27, 2008, at 08:03, Sanne Grinovero wrote:
>
>
> > I would like to address A and B first; may I
> > open a single JIRA for both? both issues are related to the same files
> > and I would
> > like to avoid having to produce two different patches, potentially
> conflicting
> > with myself.
> >
>
>
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal

2008-04-28 Thread Emmanuel Bernard

One JIRA / patch is good
Self-conflict is not always bad "When fight begins within himself, a  
man's worth something." - Sir Frederick Browning :)


On  Apr 27, 2008, at 08:03, Sanne Grinovero wrote:


I would like to address A and B first; may I
open a single JIRA for both? both issues are related to the same files
and I would
like to avoid having to produce two different patches, potentially  
conflicting

with myself.


___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal

2008-04-27 Thread Sanne Grinovero
Hi Hardy,

A -- nice I'm getting the chance to teach something to you gurus..
yes this is really required, locking is fine, but there are two problems there;
  a) visibility: JVM makes no guarantee another thread will see the changes
made to a variable by another thread; it could see it "later on" and could also
never see the updated value. Also when several variables are updated,
the other thread
could see them updated in different order (so returning another
Dir.Provider than
intended by other state variables). Only "final" instance variables
are to be considered
safe.
  b) variable update atomicity: when updating a reference you are
changing several
byte variables, another thread could read some bytes of a new value and some of
an older version. This depends on architecture and JVM implementation,
but should really be avoided.
  Just remember that all communication between threads must always be
synchronized,
or even better use some of the really good java.util.concurrent.
That's why I am recommending the AtomicReference, this class has been
made exactly
to solve this problem, and on the architectures where it is unneeded
it is just compiled
to the fastest but SAFE implementation; the same considerations are true
for any java type, even primitives: one line of code in java doesn't
mean it's an
atomic op (sorry for repeating the obvious).
For a very good example on this problem:
http://en.wikipedia.org/wiki/Double_checked_locking_pattern

D -- how stupid, I'm sorry I didn't think about that!
thanks, I suppose D is ok then.

As C and D are just little improvements, but A and B look like a bugs
needing a fix with some urgency, I would like to address A and B first; may I
open a single JIRA for both? both issues are related to the same files
and I would
like to avoid having to produce two different patches, potentially conflicting
with myself.

regards,
Sanne

2008/4/27 Hardy Ferentschik <[EMAIL PROTECTED]>:
> Hi Sanne,
>
>
>
> > A) ... but all comunication beetween the client thread calling
> "getDirectory()" and the CopyDirectory
> >
> > which defines the current directory is unprotected; also volatile isn't
> enough
> > to fix it, IMHO an AtomicReference would be more appropriate.
> >
>
>  Is this really required? Is synchronisation not ensured via
> SearchFactoryImplementor.getLockableDirectoryProviders()?
>
>
> > B) ... Also I would like to address the use of finalize() to stop the
> Timer,
> >
> > there is a TODO about it already;
> >
>
>  Fair point. Initially a non daemon thread was used. Once we switched to a
> deamon thread we
>  could probably have got rid of the call to timer.canel().
>
>
>
>
> > C) FileHelper not listening to Thread.interrupt() requests
> > As FileHelper.synchronize is used in the mentioned Timers to copy
> > potentially big files,
> > it would be nice to check for Thread.isInterrupted() and also catch
> > ClosedByInterruptException in file-copy operations so to abort all I/O
> > operations and
> > not just the current one to abort faster on JVM shutdown.
> > (It doesn't look like the copy safety was guaranteed anyway).
> >
>
>  Sounds reasonable.
>
>
>
>
> > D) Master and Slave share configuration properties.
> > Wouldn't it be natural to have two different property keys to manage
> > the shared index path? I imagine the typical scenario would be to have the
> two
> > configured on different servers and copy through some NFS, but it looks
> like
> > we are forcing people to use exactly the same path on both machines as the
> two
> > classes read the same props.
> >
>
>  The properties are different. Yes, the master and the slave both define:
>  hibernate.search.default.sourceBase and hibernate.search.default.indexBase,
> but remember
>  that these are in two different configuration files. Master and slave are
> two seperate machines
>  or at the very least two different JVMs. Nothing stops you to have
> different mount points. I guess the
>  docuemtnation might suggest this.
>
>  You can definitely create a Jira  issue for B and D. Not sure about A
> though.
>  Maybe someone else can have another look at the code.
>
>  --Hardy
>
>
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal

2008-04-27 Thread Hardy Ferentschik

Hi Sanne,


A) ... but all comunication beetween the client thread calling  
"getDirectory()" and the CopyDirectory
which defines the current directory is unprotected; also volatile isn't  
enough

to fix it, IMHO an AtomicReference would be more appropriate.


Is this really required? Is synchronisation not ensured via  
SearchFactoryImplementor.getLockableDirectoryProviders()?


B) ... Also I would like to address the use of finalize() to stop the  
Timer,

there is a TODO about it already;


Fair point. Initially a non daemon thread was used. Once we switched to a  
deamon thread we

could probably have got rid of the call to timer.canel().



C) FileHelper not listening to Thread.interrupt() requests
As FileHelper.synchronize is used in the mentioned Timers to copy
potentially big files,
it would be nice to check for Thread.isInterrupted() and also catch
ClosedByInterruptException in file-copy operations so to abort all I/O
operations and
not just the current one to abort faster on JVM shutdown.
(It doesn't look like the copy safety was guaranteed anyway).


Sounds reasonable.



D) Master and Slave share configuration properties.
Wouldn't it be natural to have two different property keys to manage
the shared index path? I imagine the typical scenario would be to have  
the two
configured on different servers and copy through some NFS, but it looks  
like
we are forcing people to use exactly the same path on both machines as  
the two

classes read the same props.


The properties are different. Yes, the master and the slave both define:
hibernate.search.default.sourceBase and  
hibernate.search.default.indexBase, but remember
that these are in two different configuration files. Master and slave are  
two seperate machines
or at the very least two different JVMs. Nothing stops you to have  
different mount points. I guess the

docuemtnation might suggest this.

You can definitely create a Jira  issue for B and D. Not sure about A  
though.

Maybe someone else can have another look at the code.

--Hardy

___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


[hibernate-dev] Hibernate Search: Master/Slave DirectoryProviders improvements proposal

2008-04-26 Thread Sanne Grinovero
Hello most esteemed developers,
I have been scrutinizing the FSMasterDirectoryProvider and
FSSlaveDirectoryProvider
classes in Hibernate Search and have found some minor glitches and space
for some improvement (IMHO); I would like to hear your opinion about it,
and if you agree I'll open a JIRA and provide a patch myself ASAP.

A) Variable visibility
both FSMasterD.P. and FSSlaveD.P. have some little concurrency problems:
there is a "volatile boolean inProgress" that fixes the communication beetween
the "CopyDirectory" Runnable and the "TriggerTask" Timer, but all comunication
beetween the client thread calling "getDirectory()" and the CopyDirectory
which defines the current directory is unprotected; also volatile isn't enough
to fix it, IMHO an AtomicReference would be more appropriate.
  The locking schema looks great! I've had a hard time understanding the flow
and did some tests about the behaviour of Lucene in case of concurrent
index-deletion and forcing the most dangerous situations I could think of,
I couldn't find anything failing.
  Actually in the current situation the lock works ok but the wrong Directory
could be returned because of the visibility problems; the lock avoids most
problems but could hurt scalability (as we could return locked resources even
whenever an unlocked dir was available).

B) The finalize / stop issue
Also I would like to address the use of finalize() to stop the Timer,
there is a TODO about it already; IMHO the best solution is to remove
the finalize without changing other code: the Timer is declared as
daemon thread already, so it's going to stop automatically on JVM exit.
The current finalize idea could harm the GC, as in the finalize we are
referencing
the timer and instruct the timer to do something: the timer has a reference
to  so this would revive the reference to  and cancel the
garbage-collection;
actually I don't think this finalizer is ever being called, as the timer
also has a reference to this (being an inner class) and is running in another
thread which is certainly alive (as our purpose was to
stop it and we we still didn't get to that point).

So we could fix this issues just removing code, and making some instance
variables (such as the Timer itself) unnecessary.
Also this means you won't need a stop() to be added to
DirectoryProvider interface,
at least not for FSMasterD.P. and FSSlaveD.P. ( which are the only
known implementations
needing it, so we could also remove the TODO in the DirectoryProvider.

C) FileHelper not listening to Thread.interrupt() requests
As FileHelper.synchronize is used in the mentioned Timers to copy
potentially big files,
it would be nice to check for Thread.isInterrupted() and also catch
ClosedByInterruptException in file-copy operations so to abort all I/O
operations and
not just the current one to abort faster on JVM shutdown.
(It doesn't look like the copy safety was guaranteed anyway).

D) Master and Slave share configuration properties.
Wouldn't it be natural to have two different property keys to manage
the shared index path? I imagine the typical scenario would be to have the two
configured on different servers and copy through some NFS, but it looks like
we are forcing people to use exactly the same path on both machines as the two
classes read the same props.

kind regards,
Sanne
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Hibernate Search master/slave

2007-01-14 Thread Emmanuel Bernard



Max Rydahl Andersen wrote:


> Recently I've been working on Hibernate Search clustering.
> Hibernate Search will have the ability to work in a master / slave mode.
>
> Every node has a local copy of the index and can read it (ie query it). 
> When an update is required (actually when a transaction scoped list of 
> updates are required) by a node, a message is sent to a master node 
> responsible to update the master index.
> On a regular basis, each node acquires a copy of the master index in 
> local to stay up to day asynchronously.

>
> My first implementation involves JMS, but one could imagine using:
>  - JBoss Messaging standalone in headless mode (when it's done)
>  - a servlet controller receiving http requests
>  - whatever you want
>
> Thoughts?

Is this clustering at the level of Hibernate Search or is it something
at the lucence index level ?


Hibernate Search



Would it not make sense to be a "index tech" level concern or ?


No because the index has no notion of transaction and batch of operations.
However I plan sometime to create a JBossCache Directory (index 
container). The scalability is limited though since Lucene needs an 
index based locking system.



--
--
Max Rydahl Andersen
callto://max.rydahl.andersen

Hibernate
[EMAIL PROTECTED]
http://hibernate.org

JBoss a division of Red Hat
[EMAIL PROTECTED]


___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Hibernate Search master/slave

2007-01-13 Thread Max Rydahl Andersen

Recently I've been working on Hibernate Search clustering.
Hibernate Search will have the ability to work in a master / slave mode.

Every node has a local copy of the index and can read it (ie query it).  
When an update is required (actually when a transaction scoped list of  
updates are required) by a node, a message is sent to a master node  
responsible to update the master index.
On a regular basis, each node acquires a copy of the master index in  
local to stay up to day asynchronously.


My first implementation involves JMS, but one could imagine using:
 - JBoss Messaging standalone in headless mode (when it's done)
 - a servlet controller receiving http requests
 - whatever you want

Thoughts?


Is this clustering at the level of Hibernate Search or is it something
at the lucence index level ?

Would it not make sense to be a "index tech" level concern or ?

--
--
Max Rydahl Andersen
callto://max.rydahl.andersen

Hibernate
[EMAIL PROTECTED]
http://hibernate.org

JBoss a division of Red Hat
[EMAIL PROTECTED]
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


[hibernate-dev] Hibernate Search master/slave

2007-01-12 Thread Emmanuel Bernard

Recently I've been working on Hibernate Search clustering.
Hibernate Search will have the ability to work in a master / slave mode.

Every node has a local copy of the index and can read it (ie query it). 
When an update is required (actually when a transaction scoped list of 
updates are required) by a node, a message is sent to a master node 
responsible to update the master index.
On a regular basis, each node acquires a copy of the master index in 
local to stay up to day asynchronously.


My first implementation involves JMS, but one could imagine using:
- JBoss Messaging standalone in headless mode (when it's done)
- a servlet controller receiving http requests
- whatever you want

Thoughts?

The project moves slowly from a library to a library + infrastructure 
project. The focus will stay on the library side, but I am think of 
having some helpers to build the server side (ie build a ready to deploy 
EAR or WAR)

___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev