[jira] Commented: (CONNECTORS-100) DB lock timeout
[ https://issues.apache.org/jira/browse/CONNECTORS-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906793#action_12906793 ] Karl Wright commented on CONNECTORS-100: Just in case, I've run a spinner test pounding on the API at the same time the UI is being pounded on. This is a Windows laptop with the current trunk version. No such errors appear for me. DB lock timeout --- Key: CONNECTORS-100 URL: https://issues.apache.org/jira/browse/CONNECTORS-100 Project: Apache Connectors Framework Issue Type: Bug Components: Framework core Environment: Running unmodified dist/example from trunk/ using the default configuration. Reporter: Andrzej Bialecki When a job is started and running (via crawler-ui) occasionally it's not possible to display a list of running jobs. The problem persists even after restarting ACF. The following exception is thrown in the console: {code} org.apache.acf.core.interfaces.ACFException: Database exception: Exception doing query: A lock could not be obtained within the time requested at org.apache.acf.core.database.Database.executeViaThread(Database.java:421) at org.apache.acf.core.database.Database.executeUncachedQuery(Database.java:465) at org.apache.acf.core.database.Database$QueryCacheExecutor.create(Database.java:1072) at org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) at org.apache.acf.core.database.Database.executeQuery(Database.java:167) at org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:727) at org.apache.acf.crawler.jobs.JobManager.makeJobStatus(JobManager.java:5611) at org.apache.acf.crawler.jobs.JobManager.getAllStatus(JobManager.java:5549) at org.apache.jsp.showjobstatus_jsp._jspService(showjobstatus_jsp.java:316) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:377) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.sql.SQLTransactionRollbackException: A lock could not be obtained within the time requested at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) at org.apache.acf.core.database.Database.execute(Database.java:526) at org.apache.acf.core.database.Database$ExecuteQueryThread.run(Database.java:381)
[jira] Commented: (CONNECTORS-99) REST API serialization inconsistency
[ https://issues.apache.org/jira/browse/CONNECTORS-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906799#action_12906799 ] Andrzej Bialecki commented on CONNECTORS-99: - Yes, it's a wish :) I can live with the way things are, and I can always check whether it's a naked object or an array... it's just not too friendly for the client. At least it would be good to document this behavior. REST API serialization inconsistency Key: CONNECTORS-99 URL: https://issues.apache.org/jira/browse/CONNECTORS-99 Project: Apache Connectors Framework Issue Type: Wish Components: API Environment: ACF trunk. Reporter: Andrzej Bialecki Priority: Minor There is some inconsistency in REST APIs that makes the returned values more difficult to process than necessary. It boils down to the fact that lists of values are serialized into JSON arrays only when there is more than 1 element on the list, but they are serialized into plain JSON objects when there is 0 or 1 element on the list. Example: * listings of jobs, connectors, connections, repositories etc. all suffer from this symptom: {code} * 1 element: {job:{id:1283811504796,description:job 1 ... * 2 elements: {job:[{id:1283811504796,description:job 1 ... {code} * nested elements, such as e.g. job metadata: {code} 1 element: metadata:{_value_:,_attribute_name:jobKey1,_attribute_value:jobVal1} 2 elements: metadata:[{_value_:,_attribute_name:jobKey1,_attribute_value:jobVal1},{_value_:,_attribute_name:jobKey2,_attribute_value:jobVal2}] {code} In my opinion, in all the above cases the API should always return a JSON array for those elements that can occur with cardinality 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-100) DB lock timeout
[ https://issues.apache.org/jira/browse/CONNECTORS-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906805#action_12906805 ] Karl Wright commented on CONNECTORS-100: Ok, that's a rather different scenario than you first described. Mainly, the database is under high load conditions, because you are in fact crawling - and it is possible that you are crawling flat-out, without any significant throttling, as well. It's entirely possible that Derby's default lock timeout is simply not long enough to support those conditions. If you want to continue to use the quick-start for your crawl task, then you will probably want to research how to increase this timeout using the derby configuration file. My suggestion though would be to try using postgresql instead, since that has much more well-known behavior characteristics. You can use postgresql with the quickstart by changing the line in properties.xml from: property name=org.apache.acf.databaseimplementationclass value=org.apache.acf.core.database.DBInterfaceDerby/ to property name=org.apache.acf.databaseimplementationclass value=org.apache.acf.core.database.DBInterfacePostgreSQL/ You will, of course, also need to install Postgresql as well. DB lock timeout --- Key: CONNECTORS-100 URL: https://issues.apache.org/jira/browse/CONNECTORS-100 Project: Apache Connectors Framework Issue Type: Bug Components: Framework core Environment: Running unmodified dist/example from trunk/ using the default configuration. Reporter: Andrzej Bialecki When a job is started and running (via crawler-ui) occasionally it's not possible to display a list of running jobs. The problem persists even after restarting ACF. The following exception is thrown in the console: {code} org.apache.acf.core.interfaces.ACFException: Database exception: Exception doing query: A lock could not be obtained within the time requested at org.apache.acf.core.database.Database.executeViaThread(Database.java:421) at org.apache.acf.core.database.Database.executeUncachedQuery(Database.java:465) at org.apache.acf.core.database.Database$QueryCacheExecutor.create(Database.java:1072) at org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) at org.apache.acf.core.database.Database.executeQuery(Database.java:167) at org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:727) at org.apache.acf.crawler.jobs.JobManager.makeJobStatus(JobManager.java:5611) at org.apache.acf.crawler.jobs.JobManager.getAllStatus(JobManager.java:5549) at org.apache.jsp.showjobstatus_jsp._jspService(showjobstatus_jsp.java:316) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:377) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.sql.SQLTransactionRollbackException: A lock could not be obtained within the time requested at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at
[jira] Created: (CONNECTORS-101) File system connector would benefit by default crawling rules
File system connector would benefit by default crawling rules - Key: CONNECTORS-101 URL: https://issues.apache.org/jira/browse/CONNECTORS-101 Project: Apache Connectors Framework Issue Type: Improvement Components: File system connector Reporter: Karl Wright Priority: Minor When you add a path to a file system connector job, it should automatically put in rules that cause it to include all files and directories under that path. This makes it easier to use, and more easily demonstrable too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-102) Web Connector should have a prepopulated bandwidth throttle
Web Connector should have a prepopulated bandwidth throttle --- Key: CONNECTORS-102 URL: https://issues.apache.org/jira/browse/CONNECTORS-102 Project: Apache Connectors Framework Issue Type: Improvement Components: Web connector Reporter: Karl Wright Priority: Minor When you first create a web connector connection, the bandwidth tab should come prepopulated with a bandwidth throttle that has the following data: Description: All domains Bin regular expression: blank Max connections: 2 Max KB per second: 64 Max fetches per minute: 12 Too many casual users of ACF have been crawling without any throttling, and that's going to give ACF a bad name in the long run, -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-103) RSS connector: Have better initial default values for throttling
RSS connector: Have better initial default values for throttling Key: CONNECTORS-103 URL: https://issues.apache.org/jira/browse/CONNECTORS-103 Project: Apache Connectors Framework Issue Type: Improvement Components: RSS connector Reporter: Karl Wright Priority: Minor When you first create an rss connector connection, the bandwidth tab should come prepopulated with the following values: Max connections per server: 2 Max KB per second per server: 64 Max fetches per minute per server: 12 Too many casual users of ACF have been crawling without any throttling, and that's going to give ACF a bad name in the long run, -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-104) Make it easier to limit a web crawl to a single site
Make it easier to limit a web crawl to a single site Key: CONNECTORS-104 URL: https://issues.apache.org/jira/browse/CONNECTORS-104 Project: Apache Connectors Framework Issue Type: Improvement Components: Web connector Affects Versions: LCF Release 0.5 Reporter: Jack Krupansky Priority: Minor Fix For: LCF Release 0.5 Unless the user explicitly enters an include regex carefully, a web crawl can quickly get out of control and start crawling the entire web when all the user may really want is to crawl just a single web site or portion thereof. So, it would be preferable if either by default or with a simple button the crawl could be limited to the seed web site(s). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-101) File system connector would benefit by default crawling rules
[ https://issues.apache.org/jira/browse/CONNECTORS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-101. Fix Version/s: LCF Release 0.5 Resolution: Fixed r993551. By the way, the UI is really pretty bad for this connector also, so I may open a ticket to clean that up as well. File system connector would benefit by default crawling rules - Key: CONNECTORS-101 URL: https://issues.apache.org/jira/browse/CONNECTORS-101 Project: Apache Connectors Framework Issue Type: Improvement Components: File system connector Reporter: Karl Wright Assignee: Karl Wright Priority: Minor Fix For: LCF Release 0.5 When you add a path to a file system connector job, it should automatically put in rules that cause it to include all files and directories under that path. This makes it easier to use, and more easily demonstrable too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-105) File system connector UI no longer adheres to connector UI standards, needs to be updated
File system connector UI no longer adheres to connector UI standards, needs to be updated - Key: CONNECTORS-105 URL: https://issues.apache.org/jira/browse/CONNECTORS-105 Project: Apache Connectors Framework Issue Type: Improvement Components: File system connector Reporter: Karl Wright Priority: Minor The file system connector specification Paths tab no longer adheres to the prevailing connector standard, which suggests a table for rule list displays. The connector UI should be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-105) File system connector UI no longer adheres to connector UI standards, needs to be updated
[ https://issues.apache.org/jira/browse/CONNECTORS-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-105: -- Assignee: Karl Wright File system connector UI no longer adheres to connector UI standards, needs to be updated - Key: CONNECTORS-105 URL: https://issues.apache.org/jira/browse/CONNECTORS-105 Project: Apache Connectors Framework Issue Type: Improvement Components: File system connector Reporter: Karl Wright Assignee: Karl Wright Priority: Minor Fix For: LCF Release 0.5 The file system connector specification Paths tab no longer adheres to the prevailing connector standard, which suggests a table for rule list displays. The connector UI should be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-105) File system connector UI no longer adheres to connector UI standards, needs to be updated
[ https://issues.apache.org/jira/browse/CONNECTORS-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-105. Fix Version/s: LCF Release 0.5 Resolution: Fixed r993565. File system connector UI no longer adheres to connector UI standards, needs to be updated - Key: CONNECTORS-105 URL: https://issues.apache.org/jira/browse/CONNECTORS-105 Project: Apache Connectors Framework Issue Type: Improvement Components: File system connector Reporter: Karl Wright Assignee: Karl Wright Priority: Minor Fix For: LCF Release 0.5 The file system connector specification Paths tab no longer adheres to the prevailing connector standard, which suggests a table for rule list displays. The connector UI should be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.