Re: [dspace-tech] Re: DSpace 5.9 performance

2018-11-05 Thread kardeiz
Hi Alan,

Thanks for the note and sharing your solution.

Another possible solution would be to just ignore caching entirely for the 
search pages.

It seems like whatever time is saved in the Cocoon processing pipeline by 
having this cache would be far outweighed by not having to do all the 
database queries to look up bundles and bitstreams. As far as I can tell, 
discovery pages don't really need any bitstream information besides what is 
in the Solr index. And it seems like these bundle/bitstream queries are 
performed even when there is a valid cache (it is my understanding that the 
caching is for later Cocoon processing steps).

I'm not a Cocoon expert, though, and I haven't read through all the 
DSpaceValidity code, so I might be wrong.

Jacob

On Monday, November 5, 2018 at 1:49:00 PM UTC-6, Alan Orth wrote:
>
> Good work, Jacob.
>
> I think I'll test this on our DSpace 5.8 site as well. Our solution is 
> different: we just severely rate limit and dissuade bots from accessing 
> dynamic pages like discover, browse, search-filter, and most-popular 
> (specifically, the ones in communities and collections, because the 
> site-wide robots.txt can't use wildcards). See our nginx configuration 
> template:
>
>
> https://github.com/ilri/rmg-ansible-public/commit/1aadbb839659bcb2326fbc9bb0b2b67bf13ed7f0
>
> Cheers,
>
> On Thu, Nov 1, 2018 at 11:24 PM > wrote:
>
>> PR is at https://github.com/DSpace/DSpace/pull/2254.
>>
>> Jacob
>>
>> On Thursday, November 1, 2018 at 4:00:45 PM UTC-5, kar...@gmail.com 
>> wrote:
>>>
>>> Hi Tim,
>>>
>>> I wasn't sure if my assumption that only bitstreams in the ORIGINAL 
>>> bundle are relevant to search results cache invalidation would be valid for 
>>> all users of DSpace. 
>>>
>>> I'll go ahead and open a PR though.
>>>
>>> Jacob
>>>
>>>
>>>
>>> On Thursday, November 1, 2018 at 3:47:13 PM UTC-5, Tim Donohue wrote:

 Hi Jacob,

 Would you be willing to submit a GitHub Pull Request with the code 
 changes you've made?  Or, create a ticket in our ticketing system (
 https://jira.duraspace.org/browse/DS) to describe the problem and 
 attach the fix? (You can request a JIRA account by just emailing 
 sysa...@duraspace.org.)

 Most of the development / bug fixes and improvements to DSpace come 
 from community members like yourself (and situations just like this -- 
 where someone figures out a fix that is generally applicable to others).  
 More on our code contribution process can be found at: 
 https://wiki.duraspace.org/display/DSPACE/Code+Contribution+Guidelines 

 - Tim

 On Thu, Nov 1, 2018 at 3:43 PM  wrote:

> I've figured this out!
>
> `org.dspace.app.xmlui.utils.DSpaceValidity`, which is used in 
> `AbstractSearch` to cache results, actually looks up and keys all 
> bundles, 
> then all bitstreams, for each item the search results.
>
> It seems reasonable to assume (at least for our use case) that only 
> bitstreams in the ORIGINAL bundle are relevant to search results (i.e., a 
> change in a public file is a reason to invalidate the cache, but a change 
> in non-ORIGINAL files is not).
>
> I've added a method to `DSpaceValidity` called 
> `addIfItemOnlyAddOriginalBundles`, which only keys ORIGINAL bundles for 
> an 
> `Item`, and defers to the existing `add` for everything else. I then 
> updated `AbstractSearch` to call my `addIfItemOnlyAddOriginalBundles` 
> when 
> it is adding the search result DSOs to the validity object.
>
> This has dropped my SQL query total from over 9000 to about 60, and 
> the page loads relatively fast.
>
> Unfortunately, this won't help those who have lots of bitstreams in 
> their ORIGINAL bundle, but that is perhaps unavoidable.
>
> Jacob
>
>
>
> On Wednesday, October 31, 2018 at 11:52:25 AM UTC-5, kar...@gmail.com 
> wrote:
>
>> Hi all,
>>
>> We are running DSpace 5.9 XMLUI with Tomcat 7 and Java 8 on a RHEL 7 
>> server, with a small-ish collection of items (about 20,000). We are 
>> running 
>> production with Oracle 12, but I have replicated the same issue with 
>> Postgresql 9.2.
>>
>> We have recently noticed some very long page load times. Any given 
>> discover/search page can take 2-7 seconds to load, and when there is 
>> even a 
>> moderate amount of traffic (e.g., when a bot is indexing the site at 
>> about 
>> 10 requests per second), page load times can take 30-60 seconds or 
>> longer.
>>
>> We have made the changes suggested at 
>> https://wiki.duraspace.org/display/DSDOC5x/Performance+Tuning+DSpace 
>> for both Tomcat and PostgreSQL.
>>
>> Our production site has been customized extensively, but I was able 
>> to replicate the issue with an untouched DSpace 5.9 build using the 
>> default 
>> Mirage theme with 

Re: [dspace-tech] Re: DSpace 5.9 performance

2018-11-01 Thread kardeiz
PR is at https://github.com/DSpace/DSpace/pull/2254.

Jacob

On Thursday, November 1, 2018 at 4:00:45 PM UTC-5, kar...@gmail.com wrote:
>
> Hi Tim,
>
> I wasn't sure if my assumption that only bitstreams in the ORIGINAL bundle 
> are relevant to search results cache invalidation would be valid for all 
> users of DSpace. 
>
> I'll go ahead and open a PR though.
>
> Jacob
>
>
>
> On Thursday, November 1, 2018 at 3:47:13 PM UTC-5, Tim Donohue wrote:
>>
>> Hi Jacob,
>>
>> Would you be willing to submit a GitHub Pull Request with the code 
>> changes you've made?  Or, create a ticket in our ticketing system (
>> https://jira.duraspace.org/browse/DS) to describe the problem and attach 
>> the fix? (You can request a JIRA account by just emailing 
>> sysa...@duraspace.org.)
>>
>> Most of the development / bug fixes and improvements to DSpace come from 
>> community members like yourself (and situations just like this -- where 
>> someone figures out a fix that is generally applicable to others).  More on 
>> our code contribution process can be found at: 
>> https://wiki.duraspace.org/display/DSPACE/Code+Contribution+Guidelines 
>>
>> - Tim
>>
>> On Thu, Nov 1, 2018 at 3:43 PM  wrote:
>>
>>> I've figured this out!
>>>
>>> `org.dspace.app.xmlui.utils.DSpaceValidity`, which is used in 
>>> `AbstractSearch` to cache results, actually looks up and keys all bundles, 
>>> then all bitstreams, for each item the search results.
>>>
>>> It seems reasonable to assume (at least for our use case) that only 
>>> bitstreams in the ORIGINAL bundle are relevant to search results (i.e., a 
>>> change in a public file is a reason to invalidate the cache, but a change 
>>> in non-ORIGINAL files is not).
>>>
>>> I've added a method to `DSpaceValidity` called 
>>> `addIfItemOnlyAddOriginalBundles`, which only keys ORIGINAL bundles for an 
>>> `Item`, and defers to the existing `add` for everything else. I then 
>>> updated `AbstractSearch` to call my `addIfItemOnlyAddOriginalBundles` when 
>>> it is adding the search result DSOs to the validity object.
>>>
>>> This has dropped my SQL query total from over 9000 to about 60, and the 
>>> page loads relatively fast.
>>>
>>> Unfortunately, this won't help those who have lots of bitstreams in 
>>> their ORIGINAL bundle, but that is perhaps unavoidable.
>>>
>>> Jacob
>>>
>>>
>>>
>>> On Wednesday, October 31, 2018 at 11:52:25 AM UTC-5, kar...@gmail.com 
>>> wrote:
>>>
 Hi all,

 We are running DSpace 5.9 XMLUI with Tomcat 7 and Java 8 on a RHEL 7 
 server, with a small-ish collection of items (about 20,000). We are 
 running 
 production with Oracle 12, but I have replicated the same issue with 
 Postgresql 9.2.

 We have recently noticed some very long page load times. Any given 
 discover/search page can take 2-7 seconds to load, and when there is even 
 a 
 moderate amount of traffic (e.g., when a bot is indexing the site at about 
 10 requests per second), page load times can take 30-60 seconds or longer.

 We have made the changes suggested at 
 https://wiki.duraspace.org/display/DSDOC5x/Performance+Tuning+DSpace 
 for both Tomcat and PostgreSQL.

 Our production site has been customized extensively, but I was able to 
 replicate the issue with an untouched DSpace 5.9 build using the default 
 Mirage theme with XMLUI.

 The issue is the same with both Oracle and PostgreSQL (PostgreSQL seems 
 a little bit better). 

 I have tried changing from Java 8 to Java 7.

 I have bumped up the database connection pool size to 300.

 Digging through the logs is difficult, since the problem only really 
 emerges under (moderate) load.

 However, I was able to track a single page request (to /discover), and 
 noticed that there were over 9000 individual SQL queries (for a single 
 page 
 load) that looked like:

 DEBUG org.dspace.storage.rdbms.DatabaseManager @ Running query "SELECT 
 * FROM MetadataValue WHERE resource_id= ? and resource_type_id = ? ORDER 
 BY 
 metadata_field_id, place"  with parameters: 144458,0

 (The resource_type_id `0` is for bitstreams.)

 I *think* (but could be wrong) that this is the source of our 
 performance problem; that the database is just getting bogged down with so 
 many requests. Looking at PostgreSQL's slow query logging, some of these 
 individual queries are taking about 1 second.

 Our situation is perhaps unique in that we have dozens (sometimes 
 hundreds) of "dark" (non-ORIGINAL) archival files associated with an item, 
 and it looks like this discover page is trying to load metadata for all of 
 them.

 This doesn't happen with an equivalent query in JSPUI.

 Any suggestions or workarounds? Why does the search page need to get 
 metadata for all bitstreams? 

 Does anyone know if upgrading to DSpace 6 would resolve 

Re: [dspace-tech] Re: DSpace 5.9 performance

2018-11-01 Thread kardeiz
Hi Tim,

I wasn't sure if my assumption that only bitstreams in the ORIGINAL bundle 
are relevant to search results cache invalidation would be valid for all 
users of DSpace. 

I'll go ahead and open a PR though.

Jacob



On Thursday, November 1, 2018 at 3:47:13 PM UTC-5, Tim Donohue wrote:
>
> Hi Jacob,
>
> Would you be willing to submit a GitHub Pull Request with the code changes 
> you've made?  Or, create a ticket in our ticketing system (
> https://jira.duraspace.org/browse/DS) to describe the problem and attach 
> the fix? (You can request a JIRA account by just emailing 
> sysa...@duraspace.org .)
>
> Most of the development / bug fixes and improvements to DSpace come from 
> community members like yourself (and situations just like this -- where 
> someone figures out a fix that is generally applicable to others).  More on 
> our code contribution process can be found at: 
> https://wiki.duraspace.org/display/DSPACE/Code+Contribution+Guidelines 
>
> - Tim
>
> On Thu, Nov 1, 2018 at 3:43 PM > wrote:
>
>> I've figured this out!
>>
>> `org.dspace.app.xmlui.utils.DSpaceValidity`, which is used in 
>> `AbstractSearch` to cache results, actually looks up and keys all bundles, 
>> then all bitstreams, for each item the search results.
>>
>> It seems reasonable to assume (at least for our use case) that only 
>> bitstreams in the ORIGINAL bundle are relevant to search results (i.e., a 
>> change in a public file is a reason to invalidate the cache, but a change 
>> in non-ORIGINAL files is not).
>>
>> I've added a method to `DSpaceValidity` called 
>> `addIfItemOnlyAddOriginalBundles`, which only keys ORIGINAL bundles for an 
>> `Item`, and defers to the existing `add` for everything else. I then 
>> updated `AbstractSearch` to call my `addIfItemOnlyAddOriginalBundles` when 
>> it is adding the search result DSOs to the validity object.
>>
>> This has dropped my SQL query total from over 9000 to about 60, and the 
>> page loads relatively fast.
>>
>> Unfortunately, this won't help those who have lots of bitstreams in their 
>> ORIGINAL bundle, but that is perhaps unavoidable.
>>
>> Jacob
>>
>>
>>
>> On Wednesday, October 31, 2018 at 11:52:25 AM UTC-5, kar...@gmail.com 
>> wrote:
>>
>>> Hi all,
>>>
>>> We are running DSpace 5.9 XMLUI with Tomcat 7 and Java 8 on a RHEL 7 
>>> server, with a small-ish collection of items (about 20,000). We are running 
>>> production with Oracle 12, but I have replicated the same issue with 
>>> Postgresql 9.2.
>>>
>>> We have recently noticed some very long page load times. Any given 
>>> discover/search page can take 2-7 seconds to load, and when there is even a 
>>> moderate amount of traffic (e.g., when a bot is indexing the site at about 
>>> 10 requests per second), page load times can take 30-60 seconds or longer.
>>>
>>> We have made the changes suggested at 
>>> https://wiki.duraspace.org/display/DSDOC5x/Performance+Tuning+DSpace 
>>> for both Tomcat and PostgreSQL.
>>>
>>> Our production site has been customized extensively, but I was able to 
>>> replicate the issue with an untouched DSpace 5.9 build using the default 
>>> Mirage theme with XMLUI.
>>>
>>> The issue is the same with both Oracle and PostgreSQL (PostgreSQL seems 
>>> a little bit better). 
>>>
>>> I have tried changing from Java 8 to Java 7.
>>>
>>> I have bumped up the database connection pool size to 300.
>>>
>>> Digging through the logs is difficult, since the problem only really 
>>> emerges under (moderate) load.
>>>
>>> However, I was able to track a single page request (to /discover), and 
>>> noticed that there were over 9000 individual SQL queries (for a single page 
>>> load) that looked like:
>>>
>>> DEBUG org.dspace.storage.rdbms.DatabaseManager @ Running query "SELECT 
>>> * FROM MetadataValue WHERE resource_id= ? and resource_type_id = ? ORDER BY 
>>> metadata_field_id, place"  with parameters: 144458,0
>>>
>>> (The resource_type_id `0` is for bitstreams.)
>>>
>>> I *think* (but could be wrong) that this is the source of our 
>>> performance problem; that the database is just getting bogged down with so 
>>> many requests. Looking at PostgreSQL's slow query logging, some of these 
>>> individual queries are taking about 1 second.
>>>
>>> Our situation is perhaps unique in that we have dozens (sometimes 
>>> hundreds) of "dark" (non-ORIGINAL) archival files associated with an item, 
>>> and it looks like this discover page is trying to load metadata for all of 
>>> them.
>>>
>>> This doesn't happen with an equivalent query in JSPUI.
>>>
>>> Any suggestions or workarounds? Why does the search page need to get 
>>> metadata for all bitstreams? 
>>>
>>> Does anyone know if upgrading to DSpace 6 would resolve this issue?
>>>
>>> Thanks,
>>>
>>> Jacob
>>>
>>>
>>> -- 
>> All messages to this mailing list should adhere to the DuraSpace Code of 
>> Conduct: https://duraspace.org/about/policies/code-of-conduct/
>> --- 
>> You received this message because you are subscribed to the Google Groups 

[dspace-tech] Re: DSpace 5.9 performance

2018-11-01 Thread kardeiz
I've figured this out!

`org.dspace.app.xmlui.utils.DSpaceValidity`, which is used in 
`AbstractSearch` to cache results, actually looks up and keys all bundles, 
then all bitstreams, for each item the search results.

It seems reasonable to assume (at least for our use case) that only 
bitstreams in the ORIGINAL bundle are relevant to search results (i.e., a 
change in a public file is a reason to invalidate the cache, but a change 
in non-ORIGINAL files is not).

I've added a method to `DSpaceValidity` called 
`addIfItemOnlyAddOriginalBundles`, which only keys ORIGINAL bundles for an 
`Item`, and defers to the existing `add` for everything else. I then 
updated `AbstractSearch` to call my `addIfItemOnlyAddOriginalBundles` when 
it is adding the search result DSOs to the validity object.

This has dropped my SQL query total from over 9000 to about 60, and the 
page loads relatively fast.

Unfortunately, this won't help those who have lots of bitstreams in their 
ORIGINAL bundle, but that is perhaps unavoidable.

Jacob



On Wednesday, October 31, 2018 at 11:52:25 AM UTC-5, kar...@gmail.com wrote:
>
> Hi all,
>
> We are running DSpace 5.9 XMLUI with Tomcat 7 and Java 8 on a RHEL 7 
> server, with a small-ish collection of items (about 20,000). We are running 
> production with Oracle 12, but I have replicated the same issue with 
> Postgresql 9.2.
>
> We have recently noticed some very long page load times. Any given 
> discover/search page can take 2-7 seconds to load, and when there is even a 
> moderate amount of traffic (e.g., when a bot is indexing the site at about 
> 10 requests per second), page load times can take 30-60 seconds or longer.
>
> We have made the changes suggested at 
> https://wiki.duraspace.org/display/DSDOC5x/Performance+Tuning+DSpace for 
> both Tomcat and PostgreSQL.
>
> Our production site has been customized extensively, but I was able to 
> replicate the issue with an untouched DSpace 5.9 build using the default 
> Mirage theme with XMLUI.
>
> The issue is the same with both Oracle and PostgreSQL (PostgreSQL seems a 
> little bit better). 
>
> I have tried changing from Java 8 to Java 7.
>
> I have bumped up the database connection pool size to 300.
>
> Digging through the logs is difficult, since the problem only really 
> emerges under (moderate) load.
>
> However, I was able to track a single page request (to /discover), and 
> noticed that there were over 9000 individual SQL queries (for a single page 
> load) that looked like:
>
> DEBUG org.dspace.storage.rdbms.DatabaseManager @ Running query "SELECT * 
> FROM MetadataValue WHERE resource_id= ? and resource_type_id = ? ORDER BY 
> metadata_field_id, place"  with parameters: 144458,0
>
> (The resource_type_id `0` is for bitstreams.)
>
> I *think* (but could be wrong) that this is the source of our performance 
> problem; that the database is just getting bogged down with so many 
> requests. Looking at PostgreSQL's slow query logging, some of these 
> individual queries are taking about 1 second.
>
> Our situation is perhaps unique in that we have dozens (sometimes 
> hundreds) of "dark" (non-ORIGINAL) archival files associated with an item, 
> and it looks like this discover page is trying to load metadata for all of 
> them.
>
> This doesn't happen with an equivalent query in JSPUI.
>
> Any suggestions or workarounds? Why does the search page need to get 
> metadata for all bitstreams? 
>
> Does anyone know if upgrading to DSpace 6 would resolve this issue?
>
> Thanks,
>
> Jacob
>
>
>

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] Re: DSpace 6.2 with collection home page too slow to load if items have lots of bitsteams

2018-11-01 Thread kardeiz
Hi Bill,

Thanks for your note. A quick follow up question:

On the discovery pages you tested, do most of the search result items have 
1 or 2 or 3 bitstreams (e.g., an ORIGINAL file and a PREVIEW and a 
THUMBNAIL), or do they have many?

I think my issue is that many of my search result items have dozens, 
sometimes hundreds, of bitstreams associated with them, and the discovery 
page is loading metadata individually for each of these bitstreams (which 
for a search results page with 20 items could easily be many thousands).

Thanks,

Jacob



On Thursday, November 1, 2018 at 9:40:11 AM UTC-5, Bill T wrote:
>
> In my case, I see no performance problem on discovery pages.  It is 
> only on pages with a large number of records, for instance 
> community-list or on community pages with a large number of 
> sub-communities and collections. 
>
> For me, it is most apparent for /community-list, which takes a couple 
> minutes to display. 
> -- Bill 
> On Wed, Oct 31, 2018 at 3:53 PM > wrote: 
> > 
> > Hi all, 
> > 
> > As Ying suggests in his second post here, this seems to affect search 
> (i.e. /discover) pages as well. The fix mentioned by Tim (
> https://github.com/DSpace/DSpace/pull/2016) doesn't seem to affect search 
> pages. 
> > 
> > I've encountered this on both 5.9 and 6.3. I created another post (
> https://groups.google.com/forum/#!topic/dspace-tech/PbKbDfQlhok) before I 
> saw this one. 
> > 
> > Jacob 
> > 
> > On Wednesday, February 21, 2018 at 4:02:06 PM UTC-6, Ying Jin wrote: 
> >> 
> >> Dear All, 
> >> 
> >> We are experiencing a performance issue with DSpace 6.2. Some of our 
> collections will time out/take several minutes to load. They are not big 
> collections, one of them only have 20 items, but each item contains 100+ 
> bitstreams (one PDF in Original bundle and 100+ of JP2 files in customized 
> MASTER bundle which are hidden from end users). The postgresql left so many 
> "idle in transaction" processes that will slow down the overall site. The 
> tomcat begin to take most of the CPU time too. 
> >> 
> >> We used to use 16G of memory for tomcat under v5.x, and the performance 
> has been ok. Now, even I increased to 64G(half of our server memory) under 
> 6.2, the performance didn't improve. 
> >> 
> >> We are using Tomcat 8.0.13, Java 1.8, Redhat Linux 6.7, and XMLUI. We 
> have about 80+ communities, 280+ collections, 60,000+ items and 490,000 
> bitstreams. 
> >> 
> >> To determine the cause of the slowness, I duplicate the same collection 
> with two copies. In one collection, I removed all MASTER files and leave 
> PDF only. The other collection, I zipped all MASTER files as one zip file 
> and uploaded with PDF. It turns out they all have no performance issues. 
> Seems like the number of bitstreams will affect the performance. 
> >> 
> >> After turning on the debugging, I got following information. It is too 
> big to upload the whole log so I just put hibernate stat here. 
> >> 
> >> == 
> >> The collection has 100+ bitstreams in it 
> >> == 
> >> 
> >> 2018-02-07 00:43:17,724 DEBUG 
> org.hibernate.stat.internal.ConcurrentStatisticsImpl @ HHH000117: HQL: 
> null, time: 1ms, rows: 1 
> >> 
> >> 2018-02-07 00:43:17,882 INFO 
>  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ 
> Session Metrics { 
> >> 
> >> 237746 nanoseconds spent acquiring 1 JDBC connections; 
> >> 
> >> 0 nanoseconds spent releasing 0 JDBC connections; 
> >> 
> >> 552698751 nanoseconds spent preparing 48017 JDBC statements; 
> >> 
> >> 12590561333 nanoseconds spent executing 48017 JDBC statements; 
> >> 
> >> 0 nanoseconds spent executing 0 JDBC batches; 
> >> 
> >> 929992 nanoseconds spent performing 52 L2C puts; 
> >> 
> >> 188492 nanoseconds spent performing 10 L2C hits; 
> >> 
> >> 1935100 nanoseconds spent performing 42 L2C misses; 
> >> 
> >> 133422494 nanoseconds spent executing 2 flushes (flushing a total 
> of 43868 entities and 58348 collections); 
> >> 
> >> 562373235433 nanoseconds spent executing 20136 partial-flushes 
> (flushing a total of 235915143 entities and 235915143 collections) 
> >> 
> >> } 
> >> 
> >> 2018-02-07 00:43:17,884 INFO 
>  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ 
> Session Metrics { 
> >> 
> >> 0 nanoseconds spent acquiring 0 JDBC connections; 
> >> 
> >> 0 nanoseconds spent releasing 0 JDBC connections; 
> >> 
> >> 0 nanoseconds spent preparing 0 JDBC statements; 
> >> 
> >> 0 nanoseconds spent executing 0 JDBC statements; 
> >> 
> >> 0 nanoseconds spent executing 0 JDBC batches; 
> >> 
> >> 0 nanoseconds spent performing 0 L2C puts; 
> >> 
> >> 0 nanoseconds spent performing 0 L2C hits; 
> >> 
> >> 0 nanoseconds spent performing 0 L2C misses; 
> >> 
> >> 0 nanoseconds spent executing 0 flushes (flushing a total of 0 
> entities and 0 collections); 
> >> 
> >> 0 nanoseconds spent executing 0 partial-flushes (flushing a total 
> 

[dspace-tech] Re: DSpace 6.2 with collection home page too slow to load if items have lots of bitsteams

2018-10-31 Thread kardeiz
Hi all,

As Ying suggests in his second post here, this seems to affect search (i.e. 
/discover) pages as well. The fix mentioned by Tim (
https://github.com/DSpace/DSpace/pull/2016 
)
 
doesn't seem to affect search pages.

I've encountered this on both 5.9 and 6.3. I created another post 
(https://groups.google.com/forum/#!topic/dspace-tech/PbKbDfQlhok) before I 
saw this one.

Jacob

On Wednesday, February 21, 2018 at 4:02:06 PM UTC-6, Ying Jin wrote:
>
> Dear All,
>
> We are experiencing a performance issue with DSpace 6.2. Some of our 
> collections will time out/take several minutes to load. They are not big 
> collections, one of them only have 20 items, but each item contains 100+ 
> bitstreams (one PDF in Original bundle and 100+ of JP2 files in customized 
> MASTER bundle which are hidden from end users). The postgresql left so many 
> "idle in transaction" processes that will slow down the overall site. The 
> tomcat begin to take most of the CPU time too.
>
> We used to use 16G of memory for tomcat under v5.x, and the performance 
> has been ok. Now, even I increased to 64G(half of our server memory) under 
> 6.2, the performance didn't improve. 
>
> We are using Tomcat 8.0.13, Java 1.8, Redhat Linux 6.7, and XMLUI. We have 
> about 80+ communities, 280+ collections, 60,000+ items and 490,000 
> bitstreams. 
>
> To determine the cause of the slowness, I duplicate the same collection 
> with two copies. In one collection, I removed all MASTER files and leave 
> PDF only. The other collection, I zipped all MASTER files as one zip file 
> and uploaded with PDF. It turns out they all have no performance issues. 
> Seems like the number of bitstreams will affect the performance.
>
> After turning on the debugging, I got following information. It is too big 
> to upload the whole log so I just put hibernate stat here. 
>
> ==
> The collection has 100+ bitstreams in it
> ==
>
> 2018-02-07 00:43:17,724 DEBUG 
> org.hibernate.stat.internal.ConcurrentStatisticsImpl @ HHH000117: HQL: 
> null, time: 1ms, rows: 1
>
> 2018-02-07 00:43:17,882 INFO  
> org.hibernate.engine.internal.StatisticalLoggingSessionEventListener 
> @ Session Metrics {
>
> 237746 nanoseconds spent acquiring 1 JDBC connections;
>
> 0 nanoseconds spent releasing 0 JDBC connections;
>
> 552698751 nanoseconds spent preparing 48017 JDBC statements;
>
> 12590561333 nanoseconds spent executing 48017 JDBC statements;
>
> 0 nanoseconds spent executing 0 JDBC batches;
>
> 929992 nanoseconds spent performing 52 L2C puts;
>
> 188492 nanoseconds spent performing 10 L2C hits;
>
> 1935100 nanoseconds spent performing 42 L2C misses;
>
> 133422494 nanoseconds spent executing 2 flushes (flushing a total of 
> 43868 entities and 58348 collections);
>
> 562373235433 nanoseconds spent executing 20136 partial-flushes 
> (flushing a total of 235915143 entities and 235915143 collections)
>
> }
>
> 2018-02-07 00:43:17,884 INFO  
> org.hibernate.engine.internal.StatisticalLoggingSessionEventListener 
> @ Session Metrics {
>
> 0 nanoseconds spent acquiring 0 JDBC connections;
>
> 0 nanoseconds spent releasing 0 JDBC connections;
>
> 0 nanoseconds spent preparing 0 JDBC statements;
>
> 0 nanoseconds spent executing 0 JDBC statements;
>
> 0 nanoseconds spent executing 0 JDBC batches;
>
> 0 nanoseconds spent performing 0 L2C puts;
>
> 0 nanoseconds spent performing 0 L2C hits;
>
> 0 nanoseconds spent performing 0 L2C misses;
>
> 0 nanoseconds spent executing 0 flushes (flushing a total of 0 
> entities and 0 collections);
>
> 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 
> 0 entities and 0 collections)
>
> }
>
> =
>
> The collection has PDF only
> =
>
> 2018-02-17 20:30:27,534 INFO  
> org.hibernate.engine.internal.StatisticalLoggingSessionEventListener 
> @ Session Metrics {
>
> 341778 nanoseconds spent acquiring 1 JDBC connections;
>
> 0 nanoseconds spent releasing 0 JDBC connections;
>
> 15974134 nanoseconds spent preparing 1189 JDBC statements;
>
> 271327429 nanoseconds spent executing 1189 JDBC statements;
>
> 0 nanoseconds spent executing 0 JDBC batches;
>
> 717412 nanoseconds spent performing 45 L2C puts;
>
> 3906724 nanoseconds spent performing 247 L2C hits;
>
> 2081066 nanoseconds spent performing 855 L2C misses;
>
> 6829028 nanoseconds spent executing 2 flushes (flushing a total of 
> 3916 entities and 5072 collections);
>
> 578786157 nanoseconds spent executing 261 partial-flushes (flushing a 
> total of 345641 entities and 345641 collections)
>
> }
>
> .
>
>
> 2018-02-17 20:30:27,536 INFO  
> org.hibernate.engine.internal.StatisticalLoggingSessionEventListener 
> @ Session Metrics {
>
> 120941 nanoseconds spent acquiring 1 JDBC connections;
>
> 0 nanoseconds 

[dspace-tech] Re: DSpace 5.9 performance

2018-10-31 Thread kardeiz
This seems to be related 
to https://groups.google.com/d/topic/dspace-tech/VIofW7EwEXY/discussion, 
but for 5.9, and in relation to the discovery pages rather than the 
collection lists.

On Wednesday, October 31, 2018 at 11:52:25 AM UTC-5, kar...@gmail.com wrote:
>
> Hi all,
>
> We are running DSpace 5.9 XMLUI with Tomcat 7 and Java 8 on a RHEL 7 
> server, with a small-ish collection of items (about 20,000). We are running 
> production with Oracle 12, but I have replicated the same issue with 
> Postgresql 9.2.
>
> We have recently noticed some very long page load times. Any given 
> discover/search page can take 2-7 seconds to load, and when there is even a 
> moderate amount of traffic (e.g., when a bot is indexing the site at about 
> 10 requests per second), page load times can take 30-60 seconds or longer.
>
> We have made the changes suggested at 
> https://wiki.duraspace.org/display/DSDOC5x/Performance+Tuning+DSpace for 
> both Tomcat and PostgreSQL.
>
> Our production site has been customized extensively, but I was able to 
> replicate the issue with an untouched DSpace 5.9 build using the default 
> Mirage theme with XMLUI.
>
> The issue is the same with both Oracle and PostgreSQL (PostgreSQL seems a 
> little bit better). 
>
> I have tried changing from Java 8 to Java 7.
>
> I have bumped up the database connection pool size to 300.
>
> Digging through the logs is difficult, since the problem only really 
> emerges under (moderate) load.
>
> However, I was able to track a single page request (to /discover), and 
> noticed that there were over 9000 individual SQL queries (for a single page 
> load) that looked like:
>
> DEBUG org.dspace.storage.rdbms.DatabaseManager @ Running query "SELECT * 
> FROM MetadataValue WHERE resource_id= ? and resource_type_id = ? ORDER BY 
> metadata_field_id, place"  with parameters: 144458,0
>
> (The resource_type_id `0` is for bitstreams.)
>
> I *think* (but could be wrong) that this is the source of our performance 
> problem; that the database is just getting bogged down with so many 
> requests. Looking at PostgreSQL's slow query logging, some of these 
> individual queries are taking about 1 second.
>
> Our situation is perhaps unique in that we have dozens (sometimes 
> hundreds) of "dark" (non-ORIGINAL) archival files associated with an item, 
> and it looks like this discover page is trying to load metadata for all of 
> them.
>
> This doesn't happen with an equivalent query in JSPUI.
>
> Any suggestions or workarounds? Why does the search page need to get 
> metadata for all bitstreams? 
>
> Does anyone know if upgrading to DSpace 6 would resolve this issue?
>
> Thanks,
>
> Jacob
>
>
>

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


[dspace-tech] Re: DSpace 5.9 performance

2018-10-31 Thread kardeiz
I just upgraded to 6.3 on my development server, and it looks like the 
problem is just as bad if not worse

Jacob

On Wednesday, October 31, 2018 at 11:52:25 AM UTC-5, kar...@gmail.com wrote:
>
> Hi all,
>
> We are running DSpace 5.9 XMLUI with Tomcat 7 and Java 8 on a RHEL 7 
> server, with a small-ish collection of items (about 20,000). We are running 
> production with Oracle 12, but I have replicated the same issue with 
> Postgresql 9.2.
>
> We have recently noticed some very long page load times. Any given 
> discover/search page can take 2-7 seconds to load, and when there is even a 
> moderate amount of traffic (e.g., when a bot is indexing the site at about 
> 10 requests per second), page load times can take 30-60 seconds or longer.
>
> We have made the changes suggested at 
> https://wiki.duraspace.org/display/DSDOC5x/Performance+Tuning+DSpace for 
> both Tomcat and PostgreSQL.
>
> Our production site has been customized extensively, but I was able to 
> replicate the issue with an untouched DSpace 5.9 build using the default 
> Mirage theme with XMLUI.
>
> The issue is the same with both Oracle and PostgreSQL (PostgreSQL seems a 
> little bit better). 
>
> I have tried changing from Java 8 to Java 7.
>
> I have bumped up the database connection pool size to 300.
>
> Digging through the logs is difficult, since the problem only really 
> emerges under (moderate) load.
>
> However, I was able to track a single page request (to /discover), and 
> noticed that there were over 9000 individual SQL queries (for a single page 
> load) that looked like:
>
> DEBUG org.dspace.storage.rdbms.DatabaseManager @ Running query "SELECT * 
> FROM MetadataValue WHERE resource_id= ? and resource_type_id = ? ORDER BY 
> metadata_field_id, place"  with parameters: 144458,0
>
> (The resource_type_id `0` is for bitstreams.)
>
> I *think* (but could be wrong) that this is the source of our performance 
> problem; that the database is just getting bogged down with so many 
> requests. Looking at PostgreSQL's slow query logging, some of these 
> individual queries are taking about 1 second.
>
> Our situation is perhaps unique in that we have dozens (sometimes 
> hundreds) of "dark" (non-ORIGINAL) archival files associated with an item, 
> and it looks like this discover page is trying to load metadata for all of 
> them.
>
> This doesn't happen with an equivalent query in JSPUI.
>
> Any suggestions or workarounds? Why does the search page need to get 
> metadata for all bitstreams? 
>
> Does anyone know if upgrading to DSpace 6 would resolve this issue?
>
> Thanks,
>
> Jacob
>
>
>

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


[dspace-tech] DSpace 5.9 performance

2018-10-31 Thread kardeiz
Hi all,

We are running DSpace 5.9 XMLUI with Tomcat 7 and Java 8 on a RHEL 7 
server, with a small-ish collection of items (about 20,000). We are running 
production with Oracle 12, but I have replicated the same issue with 
Postgresql 9.2.

We have recently noticed some very long page load times. Any given 
discover/search page can take 2-7 seconds to load, and when there is even a 
moderate amount of traffic (e.g., when a bot is indexing the site at about 
10 requests per second), page load times can take 30-60 seconds or longer.

We have made the changes suggested 
at https://wiki.duraspace.org/display/DSDOC5x/Performance+Tuning+DSpace for 
both Tomcat and PostgreSQL.

Our production site has been customized extensively, but I was able to 
replicate the issue with an untouched DSpace 5.9 build using the default 
Mirage theme with XMLUI.

The issue is the same with both Oracle and PostgreSQL (PostgreSQL seems a 
little bit better). 

I have tried changing from Java 8 to Java 7.

I have bumped up the database connection pool size to 300.

Digging through the logs is difficult, since the problem only really 
emerges under (moderate) load.

However, I was able to track a single page request (to /discover), and 
noticed that there were over 9000 individual SQL queries (for a single page 
load) that looked like:

DEBUG org.dspace.storage.rdbms.DatabaseManager @ Running query "SELECT * 
FROM MetadataValue WHERE resource_id= ? and resource_type_id = ? ORDER BY 
metadata_field_id, place"  with parameters: 144458,0

(The resource_type_id `0` is for bitstreams.)

I *think* (but could be wrong) that this is the source of our performance 
problem; that the database is just getting bogged down with so many 
requests. Looking at PostgreSQL's slow query logging, some of these 
individual queries are taking about 1 second.

Our situation is perhaps unique in that we have dozens (sometimes hundreds) 
of "dark" (non-ORIGINAL) archival files associated with an item, and it 
looks like this discover page is trying to load metadata for all of them.

This doesn't happen with an equivalent query in JSPUI.

Any suggestions or workarounds? Why does the search page need to get 
metadata for all bitstreams? 

Does anyone know if upgrading to DSpace 6 would resolve this issue?

Thanks,

Jacob


-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] dspace vs. tomcat user account

2018-09-19 Thread kardeiz
Hi Tim,

Thanks for the reply!

Unfortunately, I don't think that would work with the RedHat/CentOS 7 
Tomcat (7) package. I just spun up a fresh centos:7 docker container with 
packaged Tomcat to confirm.

The user is hard-coded into the systemd service file (which could be 
modified). 

When Tomcat is run as another user, logging is not persisted in 
/var/log/tomcat (unless permissions are changed). It may be necessary to 
update the permissions for work and temp as well, I didn't have a way to 
quickly test that.

Years ago we did run Tomcat as DSpace, under RedHat 6 where Tomcat 7 was 
provided by EPEL. This worked fine mostly, though we did have an instance 
where an unattended Tomcat upgrade broke permissions (which was quickly 
resolved, but required manual intervention).

Jacob



On Wednesday, September 19, 2018 at 11:27:31 AM UTC-5, Tim Donohue wrote:
>
> Hello Jacob,
>
> I'd recommend *not* modifying the "tomcat" user account, and simply 
> updating Tomcat to run as the "dspace" user.  That's exactly the strategy 
> we use on the http://demo.dspace.org site (running Ubuntu 16.04).  It has 
> worked fine for us, even with "unattended upgrades" enabled.  In case it's 
> useful, we have publicly shared the Puppet scripts we use for the setup of 
> demo.dspace.org.  Here's all of the permission changes/tweaks we make to 
> Tomcat:
>
>
> https://github.com/DSpace/puppet-dspace/blob/master/manifests/tomcat_instance.pp
>
> In case you don't know Puppet syntax well, here it is in human terms:
>
>1. Install Tomcat as normal (from apt-get).  
>CATALINA_HOME=/usr/share/tomcat7   CATALINA_BASE=/var/lib/tomcat7
>2. Override the default  setting, configuring 
>it to use the [dspace]/webapps/ location.  E.g. appBase="/home/dspace/dspace/webapps" unpackWARs="true" autoDeploy="true">
>3. Stop Tomcat
>4. Update the service script to run-as "dspace".  This involves 
>editing the /etc/default/tomcat7 script and changing 
>"TOMCAT7_USER=dspace".  NOTE: We leave the TOMCAT7_GROUP=tomcat7
>5. Changes the ownership permissions (recursively) on CATALINA_BASE to 
>"dspace:tomcat7"  (notice again we keep the tomcat7 group, and also that 
> we 
>do NOT change permissions on CATALINA_HOME)
>
> That's basically it.  That Puppet script does some other stuff specific to 
> our setup...but, these 5 steps are all you need to setup Tomcat to run as a 
> "dspace" user.  As noted, we also have Ubuntu's "unattended-upgrades" 
> enabled on this server, and have yet to notice it revert our CATALINA_BASE 
> back to different ownership.
>
> With these settings in place, you can run all builds as the "dspace" user. 
> You'll notice that DSpace itself (see #2) is installed under 
> ~dspace/dspace/  (which is in the "dspace" user home directory).  That 
> directory is owned by "dspace:dspace".
>
> - Tim
>
> On Wed, Sep 19, 2018 at 10:56 AM > wrote:
>
>> I guess mostly I am uncomfortable modifying a user account that I didn't 
>> create (tomcat). I don't know all the ins and outs and permissions granted 
>> to that user as part of the tomcat package (in my case from RedHat 7 
>> server). I don't fully know the security implications of letting tomcat 
>> login.
>>
>> Because of that, it is awkward to work with DSpace (requiring all those 
>> `sudo -u tomcat ...` commands).
>>
>> Until recently I was running DSpace where the DSpace logs and a couple 
>> other directories were owned by dspace:tomcat, and this mostly worked well; 
>> it was just somewhat painful to maintain.
>>
>> Jacob
>>
>>
>>
>> On Wednesday, September 19, 2018 at 10:41:11 AM UTC-5, helix84 wrote:
>>
>>> In order to run without issues, tomcat has to run as the user who owns 
>>> the dspace files, whatever the user is named. 
>>> What is it specifically that makes a difference to you in how the user 
>>> is named? 
>>>
>>>
>>> Regards, 
>>> ~~helix84 
>>>
>>> Compulsory reading: DSpace Mailing List Etiquette 
>>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette 
>>>
>>> On Wed, Sep 19, 2018 at 5:34 PM  wrote: 
>>> > 
>>> > Thanks for the reply, helix84. 
>>> > 
>>> > But giving tomcat a password and a login shell has security 
>>> implications (which are similar to, but not the same as, having a `dspace` 
>>> user own the files). 
>>> > 
>>> > Jacob 
>>> > 
>>> > On Wednesday, September 19, 2018 at 10:25:14 AM UTC-5, helix84 wrote: 
>>> >> 
>>> >> As you noticed, if you're using packaged tomcat, it's easier to use 
>>> >> the tomcat user created by the tomcat package as the owner of all 
>>> >> dspace files. However, you don't have to sudo every single command. 
>>> >> You can either: 
>>> >> 
>>> >> A) run a shell as the tomcat user like this: 
>>> >> sudo -u tomcat -i 
>>> >> 
>>> >> B) or you can allow login for that user by setting its password: 
>>> >> sudo passwd tomcat 
>>> >> and making sure the user has a valid shell: 
>>> >> sudo usermod --shell /bin/bash tomcat 
>>> >> 
>>> >> In the later case you 

Re: [dspace-tech] dspace vs. tomcat user account

2018-09-19 Thread kardeiz
I guess mostly I am uncomfortable modifying a user account that I didn't 
create (tomcat). I don't know all the ins and outs and permissions granted 
to that user as part of the tomcat package (in my case from RedHat 7 
server). I don't fully know the security implications of letting tomcat 
login.

Because of that, it is awkward to work with DSpace (requiring all those 
`sudo -u tomcat ...` commands).

Until recently I was running DSpace where the DSpace logs and a couple 
other directories were owned by dspace:tomcat, and this mostly worked well; 
it was just somewhat painful to maintain.

Jacob



On Wednesday, September 19, 2018 at 10:41:11 AM UTC-5, helix84 wrote:
>
> In order to run without issues, tomcat has to run as the user who owns 
> the dspace files, whatever the user is named. 
> What is it specifically that makes a difference to you in how the user is 
> named? 
>
>
> Regards, 
> ~~helix84 
>
> Compulsory reading: DSpace Mailing List Etiquette 
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette 
>
> On Wed, Sep 19, 2018 at 5:34 PM > wrote: 
> > 
> > Thanks for the reply, helix84. 
> > 
> > But giving tomcat a password and a login shell has security implications 
> (which are similar to, but not the same as, having a `dspace` user own the 
> files). 
> > 
> > Jacob 
> > 
> > On Wednesday, September 19, 2018 at 10:25:14 AM UTC-5, helix84 wrote: 
> >> 
> >> As you noticed, if you're using packaged tomcat, it's easier to use 
> >> the tomcat user created by the tomcat package as the owner of all 
> >> dspace files. However, you don't have to sudo every single command. 
> >> You can either: 
> >> 
> >> A) run a shell as the tomcat user like this: 
> >> sudo -u tomcat -i 
> >> 
> >> B) or you can allow login for that user by setting its password: 
> >> sudo passwd tomcat 
> >> and making sure the user has a valid shell: 
> >> sudo usermod --shell /bin/bash tomcat 
> >> 
> >> In the later case you would log in normally as the tomcat user with 
> >> the password you set. 
> >> 
> >> 
> >> Regards, 
> >> ~~helix84 
> >> 
> >> Compulsory reading: DSpace Mailing List Etiquette 
> >> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette 
> >> 
> >> On Wed, Sep 19, 2018 at 5:15 PM  wrote: 
> >> > 
> >> > DSpace installation instructions suggest creating a `dspace` user 
> account to own the DSpace installation, while current advice (1, 2, 3) 
> suggests having `tomcat` be the owner of these files (though all of these 
> links are several years old at this point). 
> >> > 
> >> > Neither is ideal. 
> >> > 
> >> > It is relatively easy to set up Tomcat to run as a different user 
> (I've used the instructions at 
> https://askubuntu.com/questions/371809/run-tomcat7-as-tomcat7-or-any-other-user/527826#527826
>  
> before), but the permission changes required are reverted whenever Tomcat 
> is updated by one's package manager. 
> >> > 
> >> > It is also relatively easy to just assign ownership of the dspace 
> installation files to Tomcat, but on some Linux distros `tomcat` is a 
> nologin user, which makes running mvn, ant, and bin/dspace commands awkward 
> (`sudo -u tomcat ...`). 
> >> > 
> >> > It is also possible to change group ownership settings on some DSpace 
> dirs (log, assetstore, etc.) so that tomcat can write to them, but is 
> difficult to keep these up to date, and the Solr index is especially tricky 
> permission-wise. 
> >> > 
> >> > It would be really nice if both the DSpace installation files and the 
> files generated at runtime (logs, etc.) had permissions that were conducive 
> to group-based reading/writing. 
> >> > 
> >> > Is this something others are interested in? What is the current 
> consensus on this issue? 
> >> > 
> >> > I'm currently using the following setup (with DSpace 5.9): 
> >> > 
> >> > Tomcat is run as tomcat. DSpace (both source and installation) is 
> owned by tomcat. 
> >> > 
> >> > To build, I have to do `sudo -u tomcat /full/path/to/maven package`, 
> then `sudo -u tomcat /full/path/to/ant update -f 
> path/to/dspace/target/dspace-installer/build.xml`. 
> >> > 
> >> > Any DSpace command (e.g., `index-discovery` or `filter-media`), I run 
> as `sudo -u tomcat path/to/dspace/bin/dspace ...`. 
> >> > 
> >> > Thanks, 
> >> > 
> >> > Jacob 
> >> > 
> >> > -- 
> >> > All messages to this mailing list should adhere to the DuraSpace Code 
> of Conduct: https://duraspace.org/about/policies/code-of-conduct/ 
> >> > --- 
> >> > You received this message because you are subscribed to the Google 
> Groups "DSpace Technical Support" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send an email to dspace-tech...@googlegroups.com. 
> >> > To post to this group, send email to dspac...@googlegroups.com. 
> >> > Visit this group at https://groups.google.com/group/dspace-tech. 
> >> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > -- 
> > All messages to this mailing list should adhere to the 

Re: [dspace-tech] dspace vs. tomcat user account

2018-09-19 Thread kardeiz
Thanks for the reply, helix84.

But giving tomcat a password and a login shell has security implications 
(which are similar to, but not the same as, having a `dspace` user own the 
files).

Jacob

On Wednesday, September 19, 2018 at 10:25:14 AM UTC-5, helix84 wrote:
>
> As you noticed, if you're using packaged tomcat, it's easier to use 
> the tomcat user created by the tomcat package as the owner of all 
> dspace files. However, you don't have to sudo every single command. 
> You can either: 
>
> A) run a shell as the tomcat user like this: 
> sudo -u tomcat -i 
>
> B) or you can allow login for that user by setting its password: 
> sudo passwd tomcat 
> and making sure the user has a valid shell: 
> sudo usermod --shell /bin/bash tomcat 
>
> In the later case you would log in normally as the tomcat user with 
> the password you set. 
>
>
> Regards, 
> ~~helix84 
>
> Compulsory reading: DSpace Mailing List Etiquette 
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette 
>
> On Wed, Sep 19, 2018 at 5:15 PM > wrote: 
> > 
> > DSpace installation instructions suggest creating a `dspace` user 
> account to own the DSpace installation, while current advice (1, 2, 3) 
> suggests having `tomcat` be the owner of these files (though all of these 
> links are several years old at this point). 
> > 
> > Neither is ideal. 
> > 
> > It is relatively easy to set up Tomcat to run as a different user (I've 
> used the instructions at 
> https://askubuntu.com/questions/371809/run-tomcat7-as-tomcat7-or-any-other-user/527826#527826
>  
> before), but the permission changes required are reverted whenever Tomcat 
> is updated by one's package manager. 
> > 
> > It is also relatively easy to just assign ownership of the dspace 
> installation files to Tomcat, but on some Linux distros `tomcat` is a 
> nologin user, which makes running mvn, ant, and bin/dspace commands awkward 
> (`sudo -u tomcat ...`). 
> > 
> > It is also possible to change group ownership settings on some DSpace 
> dirs (log, assetstore, etc.) so that tomcat can write to them, but is 
> difficult to keep these up to date, and the Solr index is especially tricky 
> permission-wise. 
> > 
> > It would be really nice if both the DSpace installation files and the 
> files generated at runtime (logs, etc.) had permissions that were conducive 
> to group-based reading/writing. 
> > 
> > Is this something others are interested in? What is the current 
> consensus on this issue? 
> > 
> > I'm currently using the following setup (with DSpace 5.9): 
> > 
> > Tomcat is run as tomcat. DSpace (both source and installation) is owned 
> by tomcat. 
> > 
> > To build, I have to do `sudo -u tomcat /full/path/to/maven package`, 
> then `sudo -u tomcat /full/path/to/ant update -f 
> path/to/dspace/target/dspace-installer/build.xml`. 
> > 
> > Any DSpace command (e.g., `index-discovery` or `filter-media`), I run as 
> `sudo -u tomcat path/to/dspace/bin/dspace ...`. 
> > 
> > Thanks, 
> > 
> > Jacob 
> > 
> > -- 
> > All messages to this mailing list should adhere to the DuraSpace Code of 
> Conduct: https://duraspace.org/about/policies/code-of-conduct/ 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "DSpace Technical Support" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to dspace-tech...@googlegroups.com . 
> > To post to this group, send email to dspac...@googlegroups.com 
> . 
> > Visit this group at https://groups.google.com/group/dspace-tech. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


[dspace-tech] dspace vs. tomcat user account

2018-09-19 Thread kardeiz
DSpace installation instructions 
 suggest 
creating a `dspace` user account to own the DSpace installation, while 
current advice (1 
,
 
2 
,
 
3 
)
 
suggests having `tomcat` be the owner of these files (though all of these 
links are several years old at this point).

Neither is ideal. 

It is relatively easy to set up Tomcat to run as a different user (I've 
used the instructions 
at 
https://askubuntu.com/questions/371809/run-tomcat7-as-tomcat7-or-any-other-user/527826#527826
 
before), but the permission changes required are reverted whenever Tomcat 
is updated by one's package manager.

It is also relatively easy to just assign ownership of the dspace 
installation files to Tomcat, but on some Linux distros `tomcat` is a 
nologin user, which makes running mvn, ant, and bin/dspace commands awkward 
(`sudo -u tomcat ...`).

It is also possible to change group ownership settings on some DSpace dirs 
(log, assetstore, etc.) so that tomcat can write to them, but is difficult 
to keep these up to date, and the Solr index is especially tricky 
permission-wise.

*It would be really nice if both the DSpace installation files and the 
files generated at runtime (logs, etc.) had permissions that were conducive 
to group-based reading/writing.*

Is this something others are interested in? What is the current consensus 
on this issue?

I'm currently using the following setup (with DSpace 5.9):

Tomcat is run as tomcat. DSpace (both source and installation) is owned by 
tomcat.

To build, I have to do `sudo -u tomcat /full/path/to/maven package`, then 
`sudo -u tomcat /full/path/to/ant update -f 
path/to/dspace/target/dspace-installer/build.xml`.

Any DSpace command (e.g., `index-discovery` or `filter-media`), I run as 
`sudo -u tomcat path/to/dspace/bin/dspace ...`.

Thanks,

Jacob

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


[dspace-tech] Re: Flyway migration "Hibernate Workflow Migration" fails

2017-12-13 Thread kardeiz
I've reported this here: https://jira.duraspace.org/browse/DS-3788.

On Wednesday, December 13, 2017 at 2:56:34 PM UTC-6, kar...@gmail.com wrote:
>
> With a new installation of DSpace 6.2, pretty generic configuration (e.g., 
> no edits to anything around Flyway/Hibernate/etc.), and a fresh database 
> with Oracle XE 11.2, `./dspace database migrate` fails at "6.0.2015.08.31  
>   | DS 2701 Hibernate Workflow Migration" with the following error
>
> Script failed
> -
> SQL State  : 72000
> Error Code : 12991
> Message: ORA-12991: column is referenced in a multi-column constraint
> Line   : 36
> Statement  : ALTER TABLE cwf_collectionrole DROP COLUMN 
> collection_legacy_id
>
> at 
> org.flywaydb.core.internal.dbsupport.SqlScript.execute(SqlScript.java:117)
> at 
> org.dspace.storage.rdbms.DatabaseUtils.executeSql(DatabaseUtils.java:1089)
> ... 25 more
> Caused by: java.sql.SQLException: ORA-12991: column is referenced in a 
> multi-column constraint
>
> Any thoughts on why this is happening?
>
> Thanks,
>
> Jacob
>

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


[dspace-tech] Re: Flyway migration "Hibernate Workflow Migration" fails

2017-12-13 Thread kardeiz
Unless I am missing something, this looks like a bug in the Oracle SQL. It 
looks like the PostgreSQL commands were just copied into another file 
without the appropriate modifications (apparently Oracle doesn't handle 
column renaming in the same way as Postgres).

I ran the relevant SQL data definitions in Oracle directly and got the 
error reported by Flyway.

The faulty 
file: 
https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/resources/org/dspace/storage/rdbms/sqlmigration/workflow/oracle/xmlworkflow/V6.0_2015.08.11__DS-2701_Xml_Workflow_Migration.sql.



On Wednesday, December 13, 2017 at 2:56:34 PM UTC-6, kar...@gmail.com wrote:
>
> With a new installation of DSpace 6.2, pretty generic configuration (e.g., 
> no edits to anything around Flyway/Hibernate/etc.), and a fresh database 
> with Oracle XE 11.2, `./dspace database migrate` fails at "6.0.2015.08.31  
>   | DS 2701 Hibernate Workflow Migration" with the following error
>
> Script failed
> -
> SQL State  : 72000
> Error Code : 12991
> Message: ORA-12991: column is referenced in a multi-column constraint
> Line   : 36
> Statement  : ALTER TABLE cwf_collectionrole DROP COLUMN 
> collection_legacy_id
>
> at 
> org.flywaydb.core.internal.dbsupport.SqlScript.execute(SqlScript.java:117)
> at 
> org.dspace.storage.rdbms.DatabaseUtils.executeSql(DatabaseUtils.java:1089)
> ... 25 more
> Caused by: java.sql.SQLException: ORA-12991: column is referenced in a 
> multi-column constraint
>
> Any thoughts on why this is happening?
>
> Thanks,
>
> Jacob
>

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.