Re: distinct count of facet field values

2013-02-06 Thread Joey Dale

Try q=*:*group=truegroup.field=catgroup.ngroups=true

On 2/4/13 3:53 AM, J Mohamed Zahoor wrote:

Hi

Is it possible to get the distinct  count of a given facet field in Solr?

A query like this  q=*:*facet=truefacet.field=cat display the counts of all 
the unique categories present like

electronics: 100
applicances:200  etc..

But if the list is big.. i dont want to get the entire list and take a count by 
looping...
Instead if i get a count of the no of items in the list.. i am okay..

SOLR-2242 was doing just that...
But it is not giving a distinct count if i have multiple shards...

Is there any other way to get this?

./Zahoor






Re: SolrCloud: Does each node has to contain the same indexes?

2013-02-05 Thread Joey Dale
There is no reason that wont work. You may have to creating the 
collections using the cores api rather than the collections api, but it 
shouldn't be too bad.


-Joey

On 2/5/13 6:35 AM, Mathias Hodler wrote:

Hi,

can I set up a SolrCloud with 3 nodes (no sharding, only replicas) like in
the following scenario...

Node 1: Index_a, Index_b
Node 2: Index_b
Node 3: Index_a

Is this possible or does each node need the same indexes?

Thanks.





what's the cummunication protocol between the shards for a shard reqeust?

2012-08-02 Thread Joey
for example we have two shards: shard1 and shard2. our shard request always
goes to shard1, wondering what's the protocol when shard sends request to
shard2?

is it http. in binary format?
we are trying to set up appdynamics to monitor the shards but looks like
appdynamic could not instrument the request.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-s-the-cummunication-protocol-between-the-shards-for-a-shard-reqeust-tp3998839.html
Sent from the Solr - User mailing list archive at Nabble.com.


weired shards search problem

2012-08-02 Thread Joey
we have two shards -shard1 and shard2 - each shard has two slaves. And there
is a VIP and LB in front of each set of the slaves. 

The shard request return a SolrDocumentList object, (for same
request)SOMETIMES the getNumFound of this object return correct data(say
3) but the actual doccument list inside this object is empty(when we try
to iterate the list we could'nt find any matched doc). 

However this problem is not happening when there is only one slave per
shard?

any one have any idea of what's happening?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/weired-shards-search-problem-tp3998841.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrException: Invalid Date String:'oracle.sql.TIMESTAMP

2012-01-12 Thread Joey Grimm
Hi,

I am trying to use a dataImportHandler to import data from an oracle DB.  It
works for non-date fields but is throwing an exception once I included the
MODIFIEDDATE field (oracle.timestamp field).  Can anyone see what I'm doing
wrong here?  Thanks.



schema.xml
   field name=catModifiedDate type=date indexed=true stored=true / 

db-data-config.xml

entity name=category datasource=jdbc 
query=SELECT ID,PARENTID,ICONID,SORTORDER,MODIFIEDDATE 
FROM CATEGORY


field column=ID name=masterId /
field column=PARENTID name=catParentId /
field column=ICONID name=catIconId /
field column=SORTORDER name=catSortOrder /
field column=MODIFIEDDATE name=catModifiedDate/


WARNING: Error creating document :
SolrInputDocument[{catModifiedDate=catModifiedDate(1.0)={oracle.sql.TIMESTAMP@1e58565},
masterId=masterId(1.0)={124}, catParentId=catParentId(1.0)={118},
catIconId=catIconId(1.0)={304856}}]
org.apache.solr.common.SolrException: ERROR: [doc=124] Error adding field
'catModifiedDate'='oracle.sql.TIMESTAMP@1e58565'
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:324)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:636)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: org.apache.solr.common.SolrException: Invalid Date
String:'oracle.sql.TIMESTAMP@1e58565'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at org.apache.solr.schema.TrieField.createField(TrieField.java:421)
at 
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:281)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrException-Invalid-Date-String-oracle-sql-TIMESTAMP-tp3654419p3654419.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get SolrServer within my own servlet

2011-12-14 Thread Joey
Hi Chris,

There won't be deadlock I think because there is only one place(from my own
servlet) can trigger a index. 

Yes, I am trying to embed Solr application - I could separate my servlet to
another app and talk to Sorl via HTTP, but there will be two pieces(Solr and
my own app) of software I have to maintain - which is something I don't
like.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3587157.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to get SolrServer

2011-12-13 Thread Joey
Hi I am new to Solr and want to do some customize development. 
I have wrapped solr to my own web application, and want to write a servlet
to index a file system. 

The question is how can I get SolrServer inside my Servlet?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-tp3583304p3583304.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get SolrServer within my own servlet

2011-12-13 Thread Joey
Anybody could help?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583368.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get SolrServer within my own servlet

2011-12-13 Thread Joey
Thanks Patrick  for the reply. 

What I did was un-jar solr.war and created my own web application. Now I
want to write my own servlet to index all files inside a folder. 

I suppose there is already solrserver instance initialized when my web app
started. 

How can I access that solr server instance in my servlet?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583416.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get SolrServer within my own servlet

2011-12-13 Thread Joey
Thank you guys for the reply.

So what I want to do is to modify Solr a bit - add one servlet so I can
trigger a full index of a folder in the file system.

What I did:
   un-jar solr.war;
   Create a web app and copy the un-jar the solr files to this app;
   Create my servlet;
   Repackage the web app to a war and deploy;

by following the suggestions of you guys;
I create a EmbeddedSolrServer in my Servlet:
public void init() throws ServletException {
CoreContainer.Initializer initializer = new 
CoreContainer.Initializer();
CoreContainer coreContainer = null;
try {
coreContainer = initializer.initialize();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
_solrServer = new EmbeddedSolrServer(coreContainer, );
}

And I can now trigger the index by call:
http://localhost:8080/testservlet01. The servlet does this:
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( id, id1, 1.0f );
doc1.addField( name, doc1, 1.0f );

CollectionSolrInputDocument docs = new
ArrayListSolrInputDocument();
docs.add( doc1 );   
try {
_solrServer.add( docs );
_solrServer.commit();
} catch (SolrServerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

However, seems the search didn't return unless I restart my application:
localhost:8080/select/?q=*%3A*version=2.2start=0rows=10indent=on


I guess there are two SolrServer instances(one is EmbeddedSolrServer,
created by myself and the other is come with Solr itself and they are
holding different index?

How can I make them synchronized?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583741.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Special character in a field used by sort parameter

2011-05-25 Thread Joey
Marc SCHNEIDER marc.schneider73 at gmail.com writes:

 
 Hi,
 
 I have a field called test-id but I can't use it when sorting, for example :
 Doesn't work : (undefined field test)
 http://localhost:8180/solr/test-public/select/?q=test-id:1sort=test-id+asc
 http://localhost:8180/solr/test-public/select/?q=test-id:1sort=test\-id+asc
 
 When removing the sort parameter then it works...
 
 Is there a way of escaping the field name in sort parameter?
 
 Thanks in advance,
 Marc.
 


I've also got a similar issue. When the field name has a hyphen and the first
character is alphabetical, upon sorting solr says my field is undefined. 

a) It sorts fine when the first character is numerical, and
b) I've tried encoding the url but hyphens don't encode.

If anyone has a fix, I would be stoked to hear it.

J



Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-11 Thread Joey Hanzel
Awesome. Thanks Jayendra.  I hadn't caught these patches yet.

I applied SOLR-2416 patch to the solr-3.1 release tag. This resolved the
problem of archive files not being unpacked and indexed with Solr CELL.
Thanks for the FYI.
https://issues.apache.org/jira/browse/SOLR-2416

On Mon, Apr 11, 2011 at 12:02 AM, Jayendra Patil 
jayendra.patil@gmail.com wrote:

 The migration of Tika to the latest 0.8 version seems to have
 reintroduced the issue.

 I was able to get this working again with the following patches. (Solr
 Cell and Data Import handler)

 https://issues.apache.org/jira/browse/SOLR-2416
 https://issues.apache.org/jira/browse/SOLR-2332

 You can try these.

 Regards,
 Jayendra

 On Sun, Apr 10, 2011 at 10:35 PM, Joey Hanzel phan...@nearinfinity.com
 wrote:
  Hi Gary,
 
  I have been experiencing the same problem... Unable to extract content
 from
  archive file formats.  I just tried again with a clean install of Solr
 3.1.0
  (using Tika 0.8) and continue to experience the same results.  Did you
 have
  any success with this problem with Solr 1.4.1 or 3.1.0 ?
 
  I'm using this curl command to send data to Solr.
  curl 
 
 http://localhost:8080/solr/update/extract?literal.id=doc1fmap.content=attr_contentcommit=true
 
  -H application/octet-stream -F  myfile=@data.zip
 
  No problem extracting single rich text documents, but archive files only
  result in the file names within the archive being indexed. Am I missing
  something else in my configuration? Solr doesn't seem to be unpacking the
  archive files. Based on the email chain associated with your first
 message,
  some people have been able to get this functionality to work as desired.
 
  On Mon, Jan 31, 2011 at 8:27 AM, Gary Taylor g...@inovem.com wrote:
 
  Can anyone shed any light on this, and whether it could be a config
 issue?
   I'm now using the latest SVN trunk, which includes the Tika 0.8 jars.
 
  When I send a ZIP file (containing two txt files, doc1.txt and doc2.txt)
 to
  the ExtractingRequestHandler, I get the following log entry (formatted
 for
  ease of reading) :
 
  SolrInputDocument[
 {
 ignored_meta=ignored_meta(1.0)={
 [stream_source_info, file, stream_content_type,
  application/octet-stream, stream_size, 260, stream_name, solr1.zip,
  Content-Type, application/zip]
 },
 ignored_=ignored_(1.0)={
 [package-entry, package-entry]
 },
 ignored_stream_source_info=ignored_stream_source_info(1.0)={file},
 
 
  
 ignored_stream_content_type=ignored_stream_content_type(1.0)={application/octet-stream},
 
 ignored_stream_size=ignored_stream_size(1.0)={260},
 ignored_stream_name=ignored_stream_name(1.0)={solr1.zip},
 ignored_content_type=ignored_content_type(1.0)={application/zip},
 docid=docid(1.0)={74},
 type=type(1.0)={5},
 text=text(1.0)={  doc2.txtdoc1.txt}
 }
  ]
 
  So, the data coming back from Tika when parsing a ZIP file does not
 include
  the file contents, only the names of the files contained therein.  I've
  tried forcing stream.type=application/zip in the CURL string, but that
 makes
  no difference.  If I specify an invalid stream.type then I get an
 exception
  response, so I know it's being used.
 
  When I send one of those txt files individually to the
  ExtractingRequestHandler, I get:
 
  SolrInputDocument[
 {
 ignored_meta=ignored_meta(1.0)={
 [stream_source_info, file, stream_content_type, text/plain,
  stream_size, 30, Content-Encoding, ISO-8859-1, stream_name, doc1.txt]
 },
 ignored_stream_source_info=ignored_stream_source_info(1.0)={file},
 
 
  ignored_stream_content_type=ignored_stream_content_type(1.0)={text/plain},
 ignored_stream_size=ignored_stream_size(1.0)={30},
 ignored_content_encoding=ignored_content_encoding(1.0)={ISO-8859-1},
 ignored_stream_name=ignored_stream_name(1.0)={doc1.txt},
 docid=docid(1.0)={74},
 type=type(1.0)={5},
 text=text(1.0)={The quick brown fox  }
 }
  ]
 
  and we see the file contents in the text field.
 
  I'm using the following requestHandler definition in solrconfig.xml:
 
  !-- Solr Cell: http://wiki.apache.org/solr/ExtractingRequestHandler--
  requestHandler name=/update/extract
  class=org.apache.solr.handler.extraction.ExtractingRequestHandler
  startup=lazy
  lst name=defaults
  !-- All the main content goes into text... if you need to return
the extracted text or do highlighting, use a stored field. --
  str name=fmap.contenttext/str
  str name=lowernamestrue/str
  str name=uprefixignored_/str
 
  !-- capture link hrefs but ignore div attributes --
  str name=captureAttrtrue/str
  str name=fmap.alinks/str
  str name=fmap.divignored_/str
  /lst
  /requestHandler
 
  Is there any further debug or diagnostic I can get out of Tika to help
 me
  work out why it's only returning the file names and not the file
 contents
  when parsing a ZIP file?
 
 
  Thanks and kind regards

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-10 Thread Joey Hanzel
Hi Gary,

I have been experiencing the same problem... Unable to extract content from
archive file formats.  I just tried again with a clean install of Solr 3.1.0
(using Tika 0.8) and continue to experience the same results.  Did you have
any success with this problem with Solr 1.4.1 or 3.1.0 ?

I'm using this curl command to send data to Solr.
curl 
http://localhost:8080/solr/update/extract?literal.id=doc1fmap.content=attr_contentcommit=true;
-H application/octet-stream -F  myfile=@data.zip

No problem extracting single rich text documents, but archive files only
result in the file names within the archive being indexed. Am I missing
something else in my configuration? Solr doesn't seem to be unpacking the
archive files. Based on the email chain associated with your first message,
some people have been able to get this functionality to work as desired.

On Mon, Jan 31, 2011 at 8:27 AM, Gary Taylor g...@inovem.com wrote:

 Can anyone shed any light on this, and whether it could be a config issue?
  I'm now using the latest SVN trunk, which includes the Tika 0.8 jars.

 When I send a ZIP file (containing two txt files, doc1.txt and doc2.txt) to
 the ExtractingRequestHandler, I get the following log entry (formatted for
 ease of reading) :

 SolrInputDocument[
{
ignored_meta=ignored_meta(1.0)={
[stream_source_info, file, stream_content_type,
 application/octet-stream, stream_size, 260, stream_name, solr1.zip,
 Content-Type, application/zip]
},
ignored_=ignored_(1.0)={
[package-entry, package-entry]
},
ignored_stream_source_info=ignored_stream_source_info(1.0)={file},

  
 ignored_stream_content_type=ignored_stream_content_type(1.0)={application/octet-stream},

ignored_stream_size=ignored_stream_size(1.0)={260},
ignored_stream_name=ignored_stream_name(1.0)={solr1.zip},
ignored_content_type=ignored_content_type(1.0)={application/zip},
docid=docid(1.0)={74},
type=type(1.0)={5},
text=text(1.0)={  doc2.txtdoc1.txt}
}
 ]

 So, the data coming back from Tika when parsing a ZIP file does not include
 the file contents, only the names of the files contained therein.  I've
 tried forcing stream.type=application/zip in the CURL string, but that makes
 no difference.  If I specify an invalid stream.type then I get an exception
 response, so I know it's being used.

 When I send one of those txt files individually to the
 ExtractingRequestHandler, I get:

 SolrInputDocument[
{
ignored_meta=ignored_meta(1.0)={
[stream_source_info, file, stream_content_type, text/plain,
 stream_size, 30, Content-Encoding, ISO-8859-1, stream_name, doc1.txt]
},
ignored_stream_source_info=ignored_stream_source_info(1.0)={file},

  ignored_stream_content_type=ignored_stream_content_type(1.0)={text/plain},
ignored_stream_size=ignored_stream_size(1.0)={30},
ignored_content_encoding=ignored_content_encoding(1.0)={ISO-8859-1},
ignored_stream_name=ignored_stream_name(1.0)={doc1.txt},
docid=docid(1.0)={74},
type=type(1.0)={5},
text=text(1.0)={The quick brown fox  }
}
 ]

 and we see the file contents in the text field.

 I'm using the following requestHandler definition in solrconfig.xml:

 !-- Solr Cell: http://wiki.apache.org/solr/ExtractingRequestHandler --
 requestHandler name=/update/extract
 class=org.apache.solr.handler.extraction.ExtractingRequestHandler
 startup=lazy
 lst name=defaults
 !-- All the main content goes into text... if you need to return
   the extracted text or do highlighting, use a stored field. --
 str name=fmap.contenttext/str
 str name=lowernamestrue/str
 str name=uprefixignored_/str

 !-- capture link hrefs but ignore div attributes --
 str name=captureAttrtrue/str
 str name=fmap.alinks/str
 str name=fmap.divignored_/str
 /lst
 /requestHandler

 Is there any further debug or diagnostic I can get out of Tika to help me
 work out why it's only returning the file names and not the file contents
 when parsing a ZIP file?


 Thanks and kind regards,
 Gary.



 On 25/01/2011 16:48, Jayendra Patil wrote:

 Hi Gary,

 The latest Solr Trunk was able to extract and index the contents of the
 zip
 file using the ExtractingRequestHandler.
 The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and
 worked pretty well.

 Tested again with sample url and works fine -
 curl 

 http://localhost:8080/solr/core0/update/extract?stream.file=C:/temp/extract/777045.zipliteral.id=777045literal.title=Testcommit=true
 

 You would probably need to drill down to the Tika Jars and
 the apache-solr-cell-4.0-dev.jar used for Rich documents indexing.

 Regards,
 Jayendra





Re: Solr ExtractingRequestHandler with Compressed files

2010-10-26 Thread Joey Hanzel
Hi Javendra,

Thanks for the suggestion, I updated to Solr 1.4.1 and Solr Cell 1.4.1 and
tried sending a zip file that contained several html documents.
Unfortunately, that did not solve the problem.

Here's the curl command I used:
curl 
http://localhost:8983/solr/update/extract?literla.id=d...@uprefix=attr_fmap.content=attri_contentcommit=true;
-F file=data.zip

When I query for id:doc1, the attr_content lists each filename within the
zip archive. It also indexed the stream_size, stream_source and
content_type.  It does not appear to be opening up the individual files
within the zip.

Did you have to make any other configuration changes to your solrconfig.xml
or schema.xml to read the contents of the individual files?  Would it help
to pass the specific mime type on the curl line ?

On Mon, Oct 25, 2010 at 3:27 PM, Jayendra Patil 
jayendra.patil@gmail.com wrote:

 There was this issue with the previous version of Solr, wherein only the
 file names from the zip used to get indexed.
 We had faced the same issue and ended up using the Solr trunk which has the
 Tika version upgraded and works fine.

 The Solr version 1.4.1 should also have the fix included. Try using it.

 Regards,
 Jayendra

 On Fri, Oct 22, 2010 at 6:02 PM, Joey Hanzel phan...@nearinfinity.com
 wrote:

  Hi,
 
  Has anyone had success using ExtractingRequestHandler and Tika with any
 of
  the compressed file formats (zip, tar, gz, etc) ?
 
  I am sending solr the archived.tar file using curl. curl 
 
 
 http://localhost:8983/solr/update/extract?literal.id=doc1fmap.content=body_textscommit=true
  
  -H 'Content-type:application/octet-stream' --data-binary
  @/home/archived.tar
  The result I get when I query the document is that the filenames inside
 the
  archive are indexed as the body_texts, but the content of those files
 is
  not extracted or included.  This is not the behvior I expected. Ref:
 
 
 http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#article.tika.example
  .
  When I send 1 of the actual documents inside the archive using the same
  curl
  command the extracted content is then stored in the body_texts field.
  Am
  I missing a step for the compressed files?
 
  I have added all the extraction depednenices as indicated by mat in
  http://outoftime.lighthouseapp.com/projects/20339/tickets/98-solr-celland
  am able to succesfully extract data from MS Word, PDF, HTML documents.
 
  I'm using the following library versions.
   Solr 1.40,  Solr Cell 1.4.1, with Tika Core 0.4
 
  Given everything I have read this version of Tika should support
 extracting
  data from all files within a compressed file.  Any help or suggestions
  would
  be appreciated.
 



Solr ExtractingRequestHandler with Compressed files

2010-10-22 Thread Joey Hanzel
Hi,

Has anyone had success using ExtractingRequestHandler and Tika with any of
the compressed file formats (zip, tar, gz, etc) ?

I am sending solr the archived.tar file using curl. curl 
http://localhost:8983/solr/update/extract?literal.id=doc1fmap.content=body_textscommit=true;
-H 'Content-type:application/octet-stream' --data-binary
@/home/archived.tar
The result I get when I query the document is that the filenames inside the
archive are indexed as the body_texts, but the content of those files is
not extracted or included.  This is not the behvior I expected. Ref:
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#article.tika.example.
When I send 1 of the actual documents inside the archive using the same curl
command the extracted content is then stored in the body_texts field.  Am
I missing a step for the compressed files?

I have added all the extraction depednenices as indicated by mat in
http://outoftime.lighthouseapp.com/projects/20339/tickets/98-solr-cell and
am able to succesfully extract data from MS Word, PDF, HTML documents.

I'm using the following library versions.
  Solr 1.40,  Solr Cell 1.4.1, with Tika Core 0.4

Given everything I have read this version of Tika should support extracting
data from all files within a compressed file.  Any help or suggestions would
be appreciated.