Re: TEXT column > 1Gb

Rob Sargent Wed, 12 Apr 2023 14:29:53 -0700

On 4/12/23 15:03, Joe Carlson wrote:

On Apr 12, 2023, at 12:21 PM, Rob Sargent <[email protected]> wrote:

On 4/12/23 13:02, Ron wrote:
/Must/ the genome all be in one big file, or can you store them oneline per table row?
The assumption in the schema I’m using is 1 chromosome per record.Chromosomes are typically strings of continuous sequence (A, C, G, orT) separated by gaps (N) of approximately known, or completely unknownsize. In the past this has not been a problem since sequencedchromosomes were maybe 100 megabases. But sequencing is better nowwith the technology improvements and tackling more complex genomes. Sogigabase chromosomes are common.
A typical use case might be from someone interested in seeing if theycan identify the regulatory elements (the on or off switches) of agene. The protein coding part of a gene can be predicted prettyreliably, but the upstream untranslated region and regulatory elementsare tougher. So they might come to our web site and want to extractthe 5 kb bit of sequence before the start of the gene and look forsome of the common motifs that signify a protein binding site. Beingable to quickly pull out a substring of the genome to drive a web appis something we want to do quickly.

Well if you're actually using the sequence, both text and bytea areinherently substring friendly. Your problem goes back to transferringlarge strings and that's where http/tomcat is you friend. Sounds likeyou're web friendly already. You have to stream from theclient/supplier, of course.

Re: TEXT column > 1Gb

Reply via email to