Re: Indexing PDF

2011-10-05 Thread Héctor Trujillo
I've uloaded the file here:

http://www.filesonic.com/file/2342166624/Starting_a_Search_Application.pdf

try this, thanks

2011/10/5 Michael McCandless 

> Hmm, no attachment; maybe it's too large?
>
> Can you send it directly to me?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> 2011/10/5 Héctor Trujillo :
> > This is the file that give me errors.
> >
> > 2011/10/5 Michael McCandless 
> >>
> >> Can you attach this PDF to an email & send to the list?  Or is it too
> >> large for that?
> >>
> >> Or, you can try running Tika directly on the PDF to see if it's able
> >> to extract the text.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> 2011/10/5 Héctor Trujillo :
> >> > Sorry you have the reason, this file was indexed with a .Net web
> service
> >> > client, that calls a Java application(a web service) that calls Solr
> >> > using
> >> > SolrJ.
> >> >
> >> > I will try to index this in a different way, may be this resolve the
> >> > problem.
> >> >
> >> > Thanks
> >> >
> >> > Best regards
> >> >
> >> >
> >> >
> >> > El 5 de octubre de 2011 08:42, Héctor Trujillo
> >> > escribió:
> >> >
> >> >>   It seems unreasonable that if I want to index a local file, I have
> to
> >> >> references this local file by an URL.
> >> >>
> >> >> This isn't a estrange file, this is a file downloaded from lucid web
> >> >> portal
> >> >> called: Starting a Search Application.pdf
> >> >>
> >> >> This problem may be a codification problem, or char set problem. I
> open
> >> >> this file with a PDF Reader and I have no problems, and I don’t Know
> >> >> why
> >> >> referencing this file with and URL will fix this problem, can you
> help
> >> >> me?
> >> >>
> >> >> I'm working with SolrJ, from Java, does some have the same problem
> with
> >> >> SolrJ?
> >> >>
> >> >>
> >> >>
> >> >> Thanks to Paul Libbrecht, for your option.
> >> >>
> >> >>
> >> >>
> >> >> Best regards
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> 2011/10/4 Paul Libbrecht 
> >> >>
> >> >>> full of boxes for me.
> >> >>> Héctor, you need another way to reference these!
> >> >>> (e.g. a URL)
> >> >>>
> >> >>> paul
> >> >>>
> >> >>>
> >> >>> Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit :
> >> >>>
> >> >>> > Hi all, I'm indexing pdf's files with SolrJ, and most of them
> work.
> >> >>> > But
> >> >>> with
> >> >>> > some files I’ve got problems because they stored estrange
> >> >>> > characters. I
> >> >>> got
> >> >>> > stored this content:
> >> >>> > +++
> >> >>> >
> >> >>> > Starting a Search Application
> >> >>> >
> >> >>>
> >> >>>
> 
> >> >>> > Abstract
> >> >>> >
> >> >>>
> >> >>>
> 

Re: Indexing PDF

2011-10-05 Thread Héctor Trujillo
Sorry you have the reason, this file was indexed with a .Net web service
client, that calls a Java application(a web service) that calls Solr using
SolrJ.

I will try to index this in a different way, may be this resolve the
problem.

Thanks

Best regards



El 5 de octubre de 2011 08:42, Héctor Trujillo escribió:

>   It seems unreasonable that if I want to index a local file, I have to
> references this local file by an URL.
>
> This isn't a estrange file, this is a file downloaded from lucid web portal
> called: Starting a Search Application.pdf
>
> This problem may be a codification problem, or char set problem. I open
> this file with a PDF Reader and I have no problems, and I don’t Know why
> referencing this file with and URL will fix this problem, can you help me?
>
> I'm working with SolrJ, from Java, does some have the same problem with
> SolrJ?
>
>
>
> Thanks to Paul Libbrecht, for your option.
>
>
>
> Best regards
>
>
>
>
>
>
> 2011/10/4 Paul Libbrecht 
>
>> full of boxes for me.
>> Héctor, you need another way to reference these!
>> (e.g. a URL)
>>
>> paul
>>
>>
>> Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit :
>>
>> > Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But
>> with
>> > some files I’ve got problems because they stored estrange characters. I
>> got
>> > stored this content:
>> > +++
>> >
>> > Starting a Search Application
>> >
>> 
>> > Abstract
>> >
>> Starting
>> > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page
>> i
>> >
>> 

Re: Indexing PDF

2011-10-05 Thread Héctor Trujillo
  It seems unreasonable that if I want to index a local file, I have to
references this local file by an URL.

This isn't a estrange file, this is a file downloaded from lucid web portal
called: Starting a Search Application.pdf

This problem may be a codification problem, or char set problem. I open this
file with a PDF Reader and I have no problems, and I don’t Know why
referencing this file with and URL will fix this problem, can you help me?

I'm working with SolrJ, from Java, does some have the same problem with
SolrJ?



Thanks to Paul Libbrecht, for your option.



Best regards






2011/10/4 Paul Libbrecht 

> full of boxes for me.
> Héctor, you need another way to reference these!
> (e.g. a URL)
>
> paul
>
>
> Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit :
>
> > Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But
> with
> > some files I’ve got problems because they stored estrange characters. I
> got
> > stored this content:
> > +++
> >
> > Starting a Search Application
> >
> 
> > Abstract
> >
> Starting
> > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i
> >
> 
> > Starting a Search Application A Lucid Imagination White Paper ¥ April
> 2009
> > Page ii Do You Need Full-text Search?
> >
> ∞
> >
> ∞
> > ∞
> >
> 

Indexing PDF

2011-10-04 Thread Héctor Trujillo
Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But with
some files I’ve got problems because they stored estrange characters. I got
stored this content:
+++

Starting a Search Application

Abstract
Starting
a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i

Starting a Search Application A Lucid Imagination White Paper ¥ April 2009
Page ii Do You Need Full-text Search?
∞
∞
∞
Starting
a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1


Re: How to delete all of the Indexed data?

2011-09-23 Thread Héctor Trujillo
  Hi, I suppose that this isn't what you mean but I leave it here, because
it could help you.

If this what you need?



Using SolrJ, I delete all the rows of the index whit this command:

solr.deleteByQuery("id:*");



But you need to delete all the rows inserted from Nutch, could be this helps
you.



Regards,

Hector

2011/9/23 ahmad ajiloo 

> Hi all
> I sent my data from Nutch to Solr for indexing and searching. Now I want to
> delete all of the indexed data sent from Nutch. Can anyone help me?
> thanks
>


Re: Problemns querying for the keyword "a"

2011-09-20 Thread Héctor Trujillo
Yes exactly this is the reason, "the trees didn't let me see the forest",
thanks for your perfect and fast response.

2011/9/20 Gora Mohanty 

> 2011/9/20 Héctor Trujillo :
> [...]
> >  I created an index and I inserted about ten documents. I defined a filed
> > named source, and I created many rows with the value “a” in this field,
> and
> > then I started to make queries, and then I took conscience that all the
> > queries that asked for the value “a” always returned zero rows
> [...]
>
> Take a look at your Solr schema in schema.xml, and stopwords.txt.
> It is very likely that "a" is being removed as a stop word.
>
> Normally, one wants this behaviour, otherwise search results would
> be cluttered with matches for simple words like "a", "an", "the", etc.
>
> Regards,
> Gora
>


Problemns querying for the keyword "a"

2011-09-20 Thread Héctor Trujillo
  Hi all, I have find something curious probing Solr, and SolrJ, I don’t
know If this is normal, a reserved word, or a Bug could be. I can’t explain
it and I write here this question to get a reasonable explanation of this,
If it exists.

 I created an index and I inserted about ten documents. I defined a filed
named source, and I created many rows with the value “a” in this field, and
then I started to make queries, and then I took conscience that all the
queries that asked for the value “a” always returned zero rows, and this may
return 4 rows as result because I’ve inserted 4 rows with this value. I made
this query with SolrJ from Java, and then I did it with the solr Admin Web
interface example that comes with Solr, and I got the same results, zero
rows when I may got four rows for this result.

I’m a beginner in Solr, and I don’t know If this is a question of a
tokenizer or a query filter or a configuration that I’m using, and I may not
us.

The query:

source:a

And I got this response:







0

0



on

0

source:a

10

2.2











If I make a query for the keyword "b", source:b, I got all the results that
I expected.

Thanks to all, and I expect that someone could explain me this especial
behaviour, and sorry for my ignorance.