[jira] Commented: (SOLR-2286) Automatically detecting Date/Time format in the DIH
[ https://issues.apache.org/jira/browse/SOLR-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998680#comment-12998680 ] Adam Estrada commented on SOLR-2286: Otis, I would love to help on this because I do see it as being an extremely valuable time saver but I'm afraid that I wouldn't even know where to begin. I am not even at the point where I build the project from within an IDE. If you could point me in the right direction I can certainly take a stab at it. Thanks, Adam Automatically detecting Date/Time format in the DIH --- Key: SOLR-2286 URL: https://issues.apache.org/jira/browse/SOLR-2286 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Environment: Windows 7 Reporter: Adam Estrada Fix For: Next Original Estimate: 672h Remaining Estimate: 672h When ingesting several RSS/ATOM feeds, it's very laborious to format the data and time for each feed. I came across a bit of Java code that may or may not help alleviate some of this work. http://www.java2s.com/Open-Source/Java-Document/RSS-RDF/Rome/com/sun/syndication/io/impl/DateParser.java.htm I think that this would be a great addition to those of us who ingest a lot of syndicated data and then want to query on it. Thanks, Adam -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-2301) RSS Feed URL Breaking
[ https://issues.apache.org/jira/browse/SOLR-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976333#action_12976333 ] Adam Estrada edited comment on SOLR-2301 at 12/31/10 1:57 PM: -- Hoss, You are correct! When I replaced with amp; everything worked correctly. I hope that others find this issue useful! v/r, Adam was (Author: adamestrada): Hoss, You are correct! When I replaced with amp; everything worked correctly. I hope that others find this issue useful! v/r, Adam RSS Feed URL Breaking - Key: SOLR-2301 URL: https://issues.apache.org/jira/browse/SOLR-2301 Project: Solr Issue Type: Bug Components: clients - C# Affects Versions: 1.4.1, 4.0 Environment: Windows 7 Reporter: Adam Estrada This is an odd oneI am trying to index RSS feeds and have come across several issues. Some are more pressing than others. Referring to SOLR-2286 ;-) Anyway, the CDC has a list of RSS feeds that the Solr dataimporter can't work with Home page: http://emergency.cdc.gov/rss/ Page to Index: http://www2a.cdc.gov/podcasts/createrss.asp?t=rc=19 The console reports the following and as you can see it's because it does not like the param c. Any ideas on how to fix this? INFO: Processing configuration from solrconfig.xml: {config=./solr/conf/dataimpo rthandler/rss.xml} [Fatal Error] :18:63: The reference to entity c must end with the ';' delimite r. Dec 28, 2010 2:39:46 PM org.apache.solr.handler.dataimport.DataImportHandler inf orm SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurre d while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataIm porter.java:193) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.j ava:100) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImpor tHandler.java:112) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.jav a:539) at org.apache.solr.core.SolrCore.init(SolrCore.java:596) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContain er.java:243) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-2301) RSS Feed URL Breaking
[ https://issues.apache.org/jira/browse/SOLR-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976333#action_12976333 ] Adam Estrada edited comment on SOLR-2301 at 12/31/10 1:58 PM: -- Hoss, You are correct! When I replaced with code amp;/code everything worked correctly. I hope that others find this issue useful! v/r, Adam was (Author: adamestrada): Hoss, You are correct! When I replaced with amp; everything worked correctly. I hope that others find this issue useful! v/r, Adam RSS Feed URL Breaking - Key: SOLR-2301 URL: https://issues.apache.org/jira/browse/SOLR-2301 Project: Solr Issue Type: Bug Components: clients - C# Affects Versions: 1.4.1, 4.0 Environment: Windows 7 Reporter: Adam Estrada This is an odd oneI am trying to index RSS feeds and have come across several issues. Some are more pressing than others. Referring to SOLR-2286 ;-) Anyway, the CDC has a list of RSS feeds that the Solr dataimporter can't work with Home page: http://emergency.cdc.gov/rss/ Page to Index: http://www2a.cdc.gov/podcasts/createrss.asp?t=rc=19 The console reports the following and as you can see it's because it does not like the param c. Any ideas on how to fix this? INFO: Processing configuration from solrconfig.xml: {config=./solr/conf/dataimpo rthandler/rss.xml} [Fatal Error] :18:63: The reference to entity c must end with the ';' delimite r. Dec 28, 2010 2:39:46 PM org.apache.solr.handler.dataimport.DataImportHandler inf orm SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurre d while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataIm porter.java:193) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.j ava:100) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImpor tHandler.java:112) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.jav a:539) at org.apache.solr.core.SolrCore.init(SolrCore.java:596) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContain er.java:243) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Reopened: (SOLR-2301) RSS Feed URL Breaking
[ https://issues.apache.org/jira/browse/SOLR-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Estrada reopened SOLR-2301: Hoss, You are correct! When I replaced with amp; everything worked correctly. I hope that others find this issue useful! v/r, Adam RSS Feed URL Breaking - Key: SOLR-2301 URL: https://issues.apache.org/jira/browse/SOLR-2301 Project: Solr Issue Type: Bug Components: clients - C# Affects Versions: 1.4.1, 4.0 Environment: Windows 7 Reporter: Adam Estrada This is an odd oneI am trying to index RSS feeds and have come across several issues. Some are more pressing than others. Referring to SOLR-2286 ;-) Anyway, the CDC has a list of RSS feeds that the Solr dataimporter can't work with Home page: http://emergency.cdc.gov/rss/ Page to Index: http://www2a.cdc.gov/podcasts/createrss.asp?t=rc=19 The console reports the following and as you can see it's because it does not like the param c. Any ideas on how to fix this? INFO: Processing configuration from solrconfig.xml: {config=./solr/conf/dataimpo rthandler/rss.xml} [Fatal Error] :18:63: The reference to entity c must end with the ';' delimite r. Dec 28, 2010 2:39:46 PM org.apache.solr.handler.dataimport.DataImportHandler inf orm SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurre d while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataIm porter.java:193) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.j ava:100) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImpor tHandler.java:112) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.jav a:539) at org.apache.solr.core.SolrCore.init(SolrCore.java:596) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContain er.java:243) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-2301) RSS Feed URL Breaking
[ https://issues.apache.org/jira/browse/SOLR-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976333#action_12976333 ] Adam Estrada edited comment on SOLR-2301 at 12/31/10 1:58 PM: -- Hoss, You are correct! When I replaced with blockcodeamp;/blockcode everything worked correctly. I hope that others find this issue useful! v/r, Adam was (Author: adamestrada): Hoss, You are correct! When I replaced with code amp;/code everything worked correctly. I hope that others find this issue useful! v/r, Adam RSS Feed URL Breaking - Key: SOLR-2301 URL: https://issues.apache.org/jira/browse/SOLR-2301 Project: Solr Issue Type: Bug Components: clients - C# Affects Versions: 1.4.1, 4.0 Environment: Windows 7 Reporter: Adam Estrada This is an odd oneI am trying to index RSS feeds and have come across several issues. Some are more pressing than others. Referring to SOLR-2286 ;-) Anyway, the CDC has a list of RSS feeds that the Solr dataimporter can't work with Home page: http://emergency.cdc.gov/rss/ Page to Index: http://www2a.cdc.gov/podcasts/createrss.asp?t=rc=19 The console reports the following and as you can see it's because it does not like the param c. Any ideas on how to fix this? INFO: Processing configuration from solrconfig.xml: {config=./solr/conf/dataimpo rthandler/rss.xml} [Fatal Error] :18:63: The reference to entity c must end with the ';' delimite r. Dec 28, 2010 2:39:46 PM org.apache.solr.handler.dataimport.DataImportHandler inf orm SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurre d while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataIm porter.java:193) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.j ava:100) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImpor tHandler.java:112) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.jav a:539) at org.apache.solr.core.SolrCore.init(SolrCore.java:596) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContain er.java:243) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-2301) RSS Feed URL Breaking
[ https://issues.apache.org/jira/browse/SOLR-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976333#action_12976333 ] Adam Estrada edited comment on SOLR-2301 at 12/31/10 1:59 PM: -- Hoss, You are correct! When I replaced with amp; everything worked correctly. Note the space in the HTML representation. I hope that others find this issue useful! v/r, Adam was (Author: adamestrada): Hoss, You are correct! When I replaced with blockcodeamp;/blockcode everything worked correctly. I hope that others find this issue useful! v/r, Adam RSS Feed URL Breaking - Key: SOLR-2301 URL: https://issues.apache.org/jira/browse/SOLR-2301 Project: Solr Issue Type: Bug Components: clients - C# Affects Versions: 1.4.1, 4.0 Environment: Windows 7 Reporter: Adam Estrada This is an odd oneI am trying to index RSS feeds and have come across several issues. Some are more pressing than others. Referring to SOLR-2286 ;-) Anyway, the CDC has a list of RSS feeds that the Solr dataimporter can't work with Home page: http://emergency.cdc.gov/rss/ Page to Index: http://www2a.cdc.gov/podcasts/createrss.asp?t=rc=19 The console reports the following and as you can see it's because it does not like the param c. Any ideas on how to fix this? INFO: Processing configuration from solrconfig.xml: {config=./solr/conf/dataimpo rthandler/rss.xml} [Fatal Error] :18:63: The reference to entity c must end with the ';' delimite r. Dec 28, 2010 2:39:46 PM org.apache.solr.handler.dataimport.DataImportHandler inf orm SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurre d while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataIm porter.java:193) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.j ava:100) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImpor tHandler.java:112) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.jav a:539) at org.apache.solr.core.SolrCore.init(SolrCore.java:596) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContain er.java:243) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2301) RSS Feed URL Breaking
[ https://issues.apache.org/jira/browse/SOLR-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975618#action_12975618 ] Adam Estrada commented on SOLR-2301: Thanks Carl, I heard somewhere that Manifold or the Connector Framework were all going to be integrated in to Lucene/Solr. Any thoughts on that? Adam RSS Feed URL Breaking - Key: SOLR-2301 URL: https://issues.apache.org/jira/browse/SOLR-2301 Project: Solr Issue Type: Bug Components: clients - C# Affects Versions: 1.4.1, 4.0 Environment: Windows 7 Reporter: Adam Estrada This is an odd oneI am trying to index RSS feeds and have come across several issues. Some are more pressing than others. Referring to SOLR-2286 ;-) Anyway, the CDC has a list of RSS feeds that the Solr dataimporter can't work with Home page: http://emergency.cdc.gov/rss/ Page to Index: http://www2a.cdc.gov/podcasts/createrss.asp?t=rc=19 The console reports the following and as you can see it's because it does not like the param c. Any ideas on how to fix this? INFO: Processing configuration from solrconfig.xml: {config=./solr/conf/dataimpo rthandler/rss.xml} [Fatal Error] :18:63: The reference to entity c must end with the ';' delimite r. Dec 28, 2010 2:39:46 PM org.apache.solr.handler.dataimport.DataImportHandler inf orm SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurre d while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataIm porter.java:193) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.j ava:100) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImpor tHandler.java:112) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.jav a:539) at org.apache.solr.core.SolrCore.init(SolrCore.java:596) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContain er.java:243) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2286) Automatically detecting Date/Time format in the DIH
Automatically detecting Date/Time format in the DIH --- Key: SOLR-2286 URL: https://issues.apache.org/jira/browse/SOLR-2286 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4.1 Environment: Windows 7 Reporter: Adam Estrada Fix For: 1.4.1 When ingesting several RSS/ATOM feeds, it's very laborious to format the data and time for each feed. I came across a bit of Java code that may or may not help alleviate some of this work. http://www.java2s.com/Open-Source/Java-Document/RSS-RDF/Rome/com/sun/syndication/io/impl/DateParser.java.htm I think that this would be a great addition to those of us who ingest a lot of syndicated data and then want to query on it. Thanks, Adam -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org