Re: Indexing PDF
I've uloaded the file here: http://www.filesonic.com/file/2342166624/Starting_a_Search_Application.pdf try this, thanks 2011/10/5 Michael McCandless > Hmm, no attachment; maybe it's too large? > > Can you send it directly to me? > > Mike McCandless > > http://blog.mikemccandless.com > > 2011/10/5 Héctor Trujillo : > > This is the file that give me errors. > > > > 2011/10/5 Michael McCandless > >> > >> Can you attach this PDF to an email & send to the list? Or is it too > >> large for that? > >> > >> Or, you can try running Tika directly on the PDF to see if it's able > >> to extract the text. > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> 2011/10/5 Héctor Trujillo : > >> > Sorry you have the reason, this file was indexed with a .Net web > service > >> > client, that calls a Java application(a web service) that calls Solr > >> > using > >> > SolrJ. > >> > > >> > I will try to index this in a different way, may be this resolve the > >> > problem. > >> > > >> > Thanks > >> > > >> > Best regards > >> > > >> > > >> > > >> > El 5 de octubre de 2011 08:42, Héctor Trujillo > >> > escribió: > >> > > >> >> It seems unreasonable that if I want to index a local file, I have > to > >> >> references this local file by an URL. > >> >> > >> >> This isn't a estrange file, this is a file downloaded from lucid web > >> >> portal > >> >> called: Starting a Search Application.pdf > >> >> > >> >> This problem may be a codification problem, or char set problem. I > open > >> >> this file with a PDF Reader and I have no problems, and I don’t Know > >> >> why > >> >> referencing this file with and URL will fix this problem, can you > help > >> >> me? > >> >> > >> >> I'm working with SolrJ, from Java, does some have the same problem > with > >> >> SolrJ? > >> >> > >> >> > >> >> > >> >> Thanks to Paul Libbrecht, for your option. > >> >> > >> >> > >> >> > >> >> Best regards > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> 2011/10/4 Paul Libbrecht > >> >> > >> >>> full of boxes for me. > >> >>> Héctor, you need another way to reference these! > >> >>> (e.g. a URL) > >> >>> > >> >>> paul > >> >>> > >> >>> > >> >>> Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit : > >> >>> > >> >>> > Hi all, I'm indexing pdf's files with SolrJ, and most of them > work. > >> >>> > But > >> >>> with > >> >>> > some files I’ve got problems because they stored estrange > >> >>> > characters. I > >> >>> got > >> >>> > stored this content: > >> >>> > +++ > >> >>> > > >> >>> > Starting a Search Application > >> >>> > > >> >>> > >> >>> > > >> >>> > Abstract > >> >>> > > >> >>> > >> >>> >
Re: Indexing PDF
Sorry you have the reason, this file was indexed with a .Net web service client, that calls a Java application(a web service) that calls Solr using SolrJ. I will try to index this in a different way, may be this resolve the problem. Thanks Best regards El 5 de octubre de 2011 08:42, Héctor Trujillo escribió: > It seems unreasonable that if I want to index a local file, I have to > references this local file by an URL. > > This isn't a estrange file, this is a file downloaded from lucid web portal > called: Starting a Search Application.pdf > > This problem may be a codification problem, or char set problem. I open > this file with a PDF Reader and I have no problems, and I don’t Know why > referencing this file with and URL will fix this problem, can you help me? > > I'm working with SolrJ, from Java, does some have the same problem with > SolrJ? > > > > Thanks to Paul Libbrecht, for your option. > > > > Best regards > > > > > > > 2011/10/4 Paul Libbrecht > >> full of boxes for me. >> Héctor, you need another way to reference these! >> (e.g. a URL) >> >> paul >> >> >> Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit : >> >> > Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But >> with >> > some files I’ve got problems because they stored estrange characters. I >> got >> > stored this content: >> > +++ >> > >> > Starting a Search Application >> > >> >> > Abstract >> > >> Starting >> > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page >> i >> > >>
Re: Indexing PDF
It seems unreasonable that if I want to index a local file, I have to references this local file by an URL. This isn't a estrange file, this is a file downloaded from lucid web portal called: Starting a Search Application.pdf This problem may be a codification problem, or char set problem. I open this file with a PDF Reader and I have no problems, and I don’t Know why referencing this file with and URL will fix this problem, can you help me? I'm working with SolrJ, from Java, does some have the same problem with SolrJ? Thanks to Paul Libbrecht, for your option. Best regards 2011/10/4 Paul Libbrecht > full of boxes for me. > Héctor, you need another way to reference these! > (e.g. a URL) > > paul > > > Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit : > > > Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But > with > > some files I’ve got problems because they stored estrange characters. I > got > > stored this content: > > +++ > > > > Starting a Search Application > > > > > Abstract > > > Starting > > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i > > > > > Starting a Search Application A Lucid Imagination White Paper ¥ April > 2009 > > Page ii Do You Need Full-text Search? > > > ∞ > > > ∞ > > ∞ > > >
Indexing PDF
Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But with some files I’ve got problems because they stored estrange characters. I got stored this content: +++ Starting a Search Application Abstract Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page ii Do You Need Full-text Search? ∞ ∞ ∞ Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1
Re: How to delete all of the Indexed data?
Hi, I suppose that this isn't what you mean but I leave it here, because it could help you. If this what you need? Using SolrJ, I delete all the rows of the index whit this command: solr.deleteByQuery("id:*"); But you need to delete all the rows inserted from Nutch, could be this helps you. Regards, Hector 2011/9/23 ahmad ajiloo > Hi all > I sent my data from Nutch to Solr for indexing and searching. Now I want to > delete all of the indexed data sent from Nutch. Can anyone help me? > thanks >
Re: Problemns querying for the keyword "a"
Yes exactly this is the reason, "the trees didn't let me see the forest", thanks for your perfect and fast response. 2011/9/20 Gora Mohanty > 2011/9/20 Héctor Trujillo : > [...] > > I created an index and I inserted about ten documents. I defined a filed > > named source, and I created many rows with the value “a” in this field, > and > > then I started to make queries, and then I took conscience that all the > > queries that asked for the value “a” always returned zero rows > [...] > > Take a look at your Solr schema in schema.xml, and stopwords.txt. > It is very likely that "a" is being removed as a stop word. > > Normally, one wants this behaviour, otherwise search results would > be cluttered with matches for simple words like "a", "an", "the", etc. > > Regards, > Gora >
Problemns querying for the keyword "a"
Hi all, I have find something curious probing Solr, and SolrJ, I don’t know If this is normal, a reserved word, or a Bug could be. I can’t explain it and I write here this question to get a reasonable explanation of this, If it exists. I created an index and I inserted about ten documents. I defined a filed named source, and I created many rows with the value “a” in this field, and then I started to make queries, and then I took conscience that all the queries that asked for the value “a” always returned zero rows, and this may return 4 rows as result because I’ve inserted 4 rows with this value. I made this query with SolrJ from Java, and then I did it with the solr Admin Web interface example that comes with Solr, and I got the same results, zero rows when I may got four rows for this result. I’m a beginner in Solr, and I don’t know If this is a question of a tokenizer or a query filter or a configuration that I’m using, and I may not us. The query: source:a And I got this response: 0 0 on 0 source:a 10 2.2 If I make a query for the keyword "b", source:b, I got all the results that I expected. Thanks to all, and I expect that someone could explain me this especial behaviour, and sorry for my ignorance.