RE: [newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak
Sorry.  Thought Luke came bundled with Lucene, and I was just missing it..

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 19, 2005 3:28 PM
> To: Lucene Users List
> Subject: Re: [newbie] Confused about PrefixQuery
> 
> 
> 
> On Jan 19, 2005, at 4:12 PM, Jerry Jalenak wrote:
> > Thanks for reply.  Some lists want all the info, some don't.  Just 
> > thought
> > I'd try to provide as much info as possible  8-)
> 
> The info is good... I just push for simple examples :)  By 
> simplifying, 
> often the problem becomes apparent and trivial.
> 
> > That being said, where do I find Luke?
> 
> Silly response, but go to Google, type in _luke lucene_ and 
> press "I'm 
> feeling lucky" :)
> 
> But, since I already have the URL handy, here it is:
> 
>   http://www.getopt.org/luke/
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [newbie] Confused about PrefixQuery

2005-01-19 Thread Erik Hatcher
On Jan 19, 2005, at 4:12 PM, Jerry Jalenak wrote:
Thanks for reply.  Some lists want all the info, some don't.  Just 
thought
I'd try to provide as much info as possible  8-)
The info is good... I just push for simple examples :)  By simplifying, 
often the problem becomes apparent and trivial.

That being said, where do I find Luke?
Silly response, but go to Google, type in _luke lucene_ and press "I'm 
feeling lucky" :)

But, since I already have the URL handy, here it is:
http://www.getopt.org/luke/
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: [newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak


Never mind.  Stupid, stupid assumption on my part with the data.

Thanks anyway.

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


> -Original Message-
> From: Jerry Jalenak [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 19, 2005 3:12 PM
> To: 'Lucene Users List'
> Subject: RE: [newbie] Confused about PrefixQuery
> 
> 
> Erik,
> 
> Thanks for reply.  Some lists want all the info, some don't.  
> Just thought
> I'd try to provide as much info as possible  8-)
> 
> That being said, where do I find Luke?
> 
> 
> 
> Jerry Jalenak
> Senior Programmer / Analyst, Web Publishing
> LabOne, Inc.
> 10101 Renner Blvd.
> Lenexa, KS  66219
> (913) 577-1496
> 
> [EMAIL PROTECTED]
> 
> 
> > -Original Message-
> > From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, January 19, 2005 2:42 PM
> > To: Lucene Users List
> > Subject: Re: [newbie] Confused about PrefixQuery
> > 
> > 
> > 
> > On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote:
> > > The text files have two control lines at the beginning of 
> > them - CC> 
> > > and
> > > AN>.
> > 
> > That's quite a complex example to ask a user list to decipher.
> > 
> > Simplifying the example, besides making it easier for us to 
> > understand, 
> > would likely shed light on the problem.
> > 
> > > Everything (I think) indexes correctly.
> > 
> > To be sure, try Luke out and see what got indexed exactly.  You can 
> > also use Luke as an ad-hoc search tool rather than writing your own.
> > 
> > >   When I search against
> > > this index, though, I get some weird results, especially 
> > when using an 
> > > '*'
> > > at the end of my criteria.
> > 
> > The results you got definitely are weird given the query, and in my 
> > initial glance through your code I did not see the issue pop 
> > out.  Luke 
> > will likely shed much more light on the matter.
> > 
> > Erik
> > 
> > 
> > 
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> This transmission (and any information attached to it) may be 
> confidential and
> is intended solely for the use of the individual or entity to 
> which it is
> addressed. If you are not the intended recipient or the 
> person responsible for
> delivering the transmission to the intended recipient, be 
> advised that you
> have received this transmission in error and that any use, 
> dissemination,
> forwarding, printing, or copying of this information is 
> strictly prohibited.
> If you have received this transmission in error, please 
> immediately notify
> LabOne at the following email address: 
> [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak
Erik,

Thanks for reply.  Some lists want all the info, some don't.  Just thought
I'd try to provide as much info as possible  8-)

That being said, where do I find Luke?



Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 19, 2005 2:42 PM
> To: Lucene Users List
> Subject: Re: [newbie] Confused about PrefixQuery
> 
> 
> 
> On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote:
> > The text files have two control lines at the beginning of 
> them - CC> 
> > and
> > AN>.
> 
> That's quite a complex example to ask a user list to decipher.
> 
> Simplifying the example, besides making it easier for us to 
> understand, 
> would likely shed light on the problem.
> 
> > Everything (I think) indexes correctly.
> 
> To be sure, try Luke out and see what got indexed exactly.  You can 
> also use Luke as an ad-hoc search tool rather than writing your own.
> 
> >   When I search against
> > this index, though, I get some weird results, especially 
> when using an 
> > '*'
> > at the end of my criteria.
> 
> The results you got definitely are weird given the query, and in my 
> initial glance through your code I did not see the issue pop 
> out.  Luke 
> will likely shed much more light on the matter.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [newbie] Confused about PrefixQuery

2005-01-19 Thread Erik Hatcher
On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote:
The text files have two control lines at the beginning of them - CC> 
and
AN>.
That's quite a complex example to ask a user list to decipher.
Simplifying the example, besides making it easier for us to understand, 
would likely shed light on the problem.

Everything (I think) indexes correctly.
To be sure, try Luke out and see what got indexed exactly.  You can 
also use Luke as an ad-hoc search tool rather than writing your own.

  When I search against
this index, though, I get some weird results, especially when using an 
'*'
at the end of my criteria.
The results you got definitely are weird given the query, and in my 
initial glance through your code I did not see the issue pop out.  Luke 
will likely shed much more light on the matter.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


[newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak
All,

I'm investigating the use of Lucene as a search engine, and have been doing
some 'proof-of-concept' coding today.  I'm indexing about 650 text files,
and then searching against them using QueryParser.  Here's the indexing code
snippet:


public static void Result(IndexWriter indexWriter, File file)
throws FileNotFoundException
{
Document document = null;
String content = "";

BufferedReader br = new BufferedReader(new FileReader(file));
boolean EOF = false;

try
{
while(!EOF)
{
String s = (String) br.readLine();
if (null == s)
{
EOF = true;
}
else
{
if (!"".equals(s) &&
"CC>".equals(s.substring(0, 3)))
{
document = new Document();

document.add(Field.Text("account",
s.substring(3, 7)));

document.add(Field.Keyword("created", s.substring(s.indexOf("DC>") + 3,
s.indexOf("DC>") + 11)));

content = new String();
}
else if (!"".equals(s) &&
"AN>".equals(s.substring(0, 3)))
{

document.add(Field.Keyword("lastname", s.substring(3,
28).trim().toLowerCase()));

document.add(Field.Keyword("firstname", s.substring(28,
43).trim().toLowerCase()));
document.add(Field.Text("name",
s.substring(28, 43).trim() + " " + s.substring(3, 28).trim()));

document.add(Field.Keyword("controlnumber", s.substring(44, 52)));
document.add(Field.Keyword("status",
s.substring(52, 53).trim()));
document.add(Field.Keyword("ssn",
s.substring(53, 62)));
document.add(Field.Keyword("dob",
s.substring(62, 70)));

document.add(Field.Keyword("collected", s.substring(137, 145)));
}
else if (!"".equals(s) &&
"

The text files have two control lines at the beginning of them - CC> and
AN>.  I extract particular fields from these lines and add them to my
document.  Everything (I think) indexes correctly.  When I search against
this index, though, I get some weird results, especially when using an '*'
at the end of my criteria.  Here's the search code snippet:


public static void main(String[] args)
{
try
{
Searcher searcher = new IndexSearcher("c:\\ResultIndex");
Analyzer analyzer = new StandardAnalyzer();

BufferedReader br= new BufferedReader(new
InputStreamReader(System.in));
while(true)
{
System.out.println("Query: ");
String s = br.readLine();
if (null == s)
{
break;
}
else
{
Query query = QueryParser.parse(s,
"content", analyzer);
System.out.println("Searching for: " +
query.toString("content"));

Hits hits = searcher.search(query);
System.out.println("... Found " +
hits.length() + " matching documents");
System.out.println("");

for (int i = 0; i < hits.length(); i++)
{
Document document = hits.doc(i);
System.out.println("Hit " + i + ":
Specimen = " + document.get("controlnumber") + ", Account = " +
document.get("account") + 
", Status = " +
document.get("status") + ", Name = " + document.get("name") + ", SSN = " +
document.get("ssn") + 
", DOB = " +
document.get("dob") + ", Collected = " + document.get("collected") + ",
Created = " + document.get("created"));

//System.out.println(document.get("content"));
}
}
}
}
catch(Exception e)
{
System.out.println(e.getClass() + " caught with message " +
e.getMessage());
}
}


When I run this using a criteria string of 

lastname:mar*

I get back the following:

Query: 
lastname:mar*
Se