Re: dataset collection

2011-03-27 Thread Itamar Syn-Hershko
Mailman seem to have dropped the images. Anyone interested, feel free to 
email me privately for them.



Forgot to mention: the template is ASP.NET MVC's default template. At 
this point the valuable part is in the code...



Itamar.


On 28/03/2011 01:41, Itamar Syn-Hershko wrote:


Otis, I nearly forgot you :)

Attached are the screenshots.

The flow is quite obvious, but as a service here's the architecture 
description: https://github.com/synhershko/Orev/blob/master/Orev.png


Honestly there's still quite a lot to do, but the basics are already 
there. The idea is to be able to handle several corpora per language, 
and to be able to have more than one language in the system. Also, we 
should be able to remove judgments based on users, and to have an 
overall smart system of detecting poor judgments (i.e. keep judgments 
from a new user in standby until he judges a few dozens, etc).


There are quite a few questions that come to mind (and some were 
raised before), such as what is the ideal way of scoring (boolean, 
1..5, other system), whether we should trust one judgment per 
doc/topic or we should try crossing, and so on.


My plan is to give this some attention in a few weeks. As you can see, 
its quite a crucial part of my work on Hebrew search. Your thoughts / 
cooperation appreciated.


On 02/03/2011 04:11, Otis Gospodnetic wrote:

Itamar,

Would you happen to have a screenshot that shows that ORev looks like?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 

From: Itamar Syn-Hershko
To: openrelevance-dev@lucene.apache.org
Sent: Sun, February 27, 2011 2:09:02 PM
Subject: Re: dataset collection

Hi Tommaso,


Grant posted a while ago about the ASF mail archive being  available 
from a

cloud store: http://search-lucene.com/m/9udC12n9y5A.


The Orev  application (open-relevance viewer) is still under 
development. I'm
scheduled to  resume work on it in a few weeks, and can actually use 
some help

and feedback.  What I have so far is also on github
(https://github.com/synhershko/Orev), and is coded in .NET. I'm 
planning to
complete this in .NET (and there's still plenty to do, also in terms 
of design),
unless someone wishes to pick it up and do the actual coding in Java 
with me

assisting (I'm more fluent with .NET).


Itamar.


On  21/2/2011 6:25 PM, Tommaso Teofili wrote:


Hi ORPers,
I have  to use and evaluate a machine learning system for clustering
documents  so I am wondering if there is any available dataset used 
within

ORP I  could use.
BTW, is the ORV already in place? May I give you any help  with the
development/design of ORP system?
Regards,
  Tommaso





Re: dataset collection

2011-03-27 Thread Itamar Syn-Hershko

Otis, I nearly forgot you :)

Attached are the screenshots.

The flow is quite obvious, but as a service here's the architecture 
description: https://github.com/synhershko/Orev/blob/master/Orev.png


Honestly there's still quite a lot to do, but the basics are already 
there. The idea is to be able to handle several corpora per language, 
and to be able to have more than one language in the system. Also, we 
should be able to remove judgments based on users, and to have an 
overall smart system of detecting poor judgments (i.e. keep judgments 
from a new user in standby until he judges a few dozens, etc).


There are quite a few questions that come to mind (and some were raised 
before), such as what is the ideal way of scoring (boolean, 1..5, other 
system), whether we should trust one judgment per doc/topic or we should 
try crossing, and so on.


My plan is to give this some attention in a few weeks. As you can see, 
its quite a crucial part of my work on Hebrew search. Your thoughts / 
cooperation appreciated.


On 02/03/2011 04:11, Otis Gospodnetic wrote:

Itamar,

Would you happen to have a screenshot that shows that ORev looks like?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 

From: Itamar Syn-Hershko
To: openrelevance-dev@lucene.apache.org
Sent: Sun, February 27, 2011 2:09:02 PM
Subject: Re: dataset collection

Hi Tommaso,


Grant posted a while ago about the ASF mail archive being  available from a
cloud store: http://search-lucene.com/m/9udC12n9y5A.


The Orev  application (open-relevance viewer) is still under development. I'm
scheduled to  resume work on it in a few weeks, and can actually use some help
and feedback.  What I have so far is also on github
(https://github.com/synhershko/Orev), and is coded in .NET. I'm planning to
complete this in .NET (and there's still plenty to do, also in terms of design),
unless someone wishes to pick it up and do the actual coding in Java with me
assisting (I'm more fluent with .NET).


Itamar.


On  21/2/2011 6:25 PM, Tommaso Teofili wrote:


Hi ORPers,
I have  to use and evaluate a machine learning system for clustering
documents  so I am wondering if there is any available dataset used within
ORP I  could use.
BTW, is the ORV already in place? May I give you any help  with the
development/design of ORP system?
Regards,
  Tommaso





Re: dataset collection

2011-03-01 Thread Otis Gospodnetic
Itamar,

Would you happen to have a screenshot that shows that ORev looks like?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Itamar Syn-Hershko 
> To: openrelevance-dev@lucene.apache.org
> Sent: Sun, February 27, 2011 2:09:02 PM
> Subject: Re: dataset collection
> 
> Hi Tommaso,
> 
> 
> Grant posted a while ago about the ASF mail archive being  available from a 
>cloud store: http://search-lucene.com/m/9udC12n9y5A.
> 
> 
> The Orev  application (open-relevance viewer) is still under development. I'm 
>scheduled to  resume work on it in a few weeks, and can actually use some help 
>and feedback.  What I have so far is also on github 
>(https://github.com/synhershko/Orev), and is coded in .NET. I'm planning to  
>complete this in .NET (and there's still plenty to do, also in terms of 
>design),  
>unless someone wishes to pick it up and do the actual coding in Java with me  
>assisting (I'm more fluent with .NET).
> 
> 
> Itamar.
> 
> 
> On  21/2/2011 6:25 PM, Tommaso Teofili wrote:
> 
> > Hi ORPers,
> > I have  to use and evaluate a machine learning system for clustering
> > documents  so I am wondering if there is any available dataset used within
> > ORP I  could use.
> > BTW, is the ORV already in place? May I give you any help  with the
> > development/design of ORP system?
> > Regards,
> >  Tommaso
> > 
> 


Re: dataset collection

2011-03-01 Thread Tommaso Teofili
Hello Itamar,

2011/2/27 Itamar Syn-Hershko 

> Grant posted a while ago about the ASF mail archive being available from a
> cloud store: http://search-lucene.com/m/9udC12n9y5A.
>

thanks! I'll try them out :)


>
>
> The Orev application (open-relevance viewer) is still under development.
> I'm scheduled to resume work on it in a few weeks, and can actually use some
> help and feedback. What I have so far is also on github (
> https://github.com/synhershko/Orev), and is coded in .NET. I'm planning to
> complete this in .NET (and there's still plenty to do, also in terms of
> design), unless someone wishes to pick it up and do the actual coding in
> Java with me assisting (I'm more fluent with .NET).
>
>
I've had only minor experiences with .NET but if design decisions happen on
this ML I'll be happy to help.
Cheers,
Tommaso