Great, many thanks for the tips (works perfectly now) and the warning about
2.5 development in this area.
One array takes about 5 seconds to hash and a couple of hundred MB RAM. (I
have now implemented it as a Hashtable of HashSet<Integer> because one
probeset might be on two arrays.)
Is there any possibility that Affy array designs could have their
reporters/features in the database? I'm sorry I don't understand Affy files
enough to know why this can't be done in BASE as it stands. I guess they just
don't fit into ArrayDesignBlocks? This may have been discussed to death
elsewhere sorry.
tack igen!
cheers,
Bob.
Nicklas Nordborg writes:
> > Thanks Nicklas for the help with boolean queries. Now I'm confused by
> > something else also affy-related.
> >
> > I'm trying to write code to lookup which ArrayDesigns a Reporter is on.
> > For non-Affy arrays this is easy (join via arrayDesignBlocks, features and
> > reporter).
> >
> > For Affy arrays it looks like we have to use the CDF file through the Affy
> > API
> > - get all the reporter ids (e.g. ProbeSetNames) and put them in a hash for
> > reverse lookup later.
> >
> > Here's a code snippet to do some of that (see last email for the
> > definition of
> > affyQuery). A lot of the code is taken directly from
> > net/sf/basedb/core/Affymetrix.java, which is used by the "Affymetrix CDF
> > probeset importer" plugin for Affy array design (see the "verify reporters"
> > link on any Affy ArrayDesign page). When I run this plugin as the same
> > user
> > who runs the code below, it works fine. However I get a
> > NullPointerException
> > with my code...
> >
>
> The Affymetrix.loadCdfFile() method only parses the headers. You must
> call cdf.clear() and cdf.read() to parse the entire file.
>
> I have found that the Fusion SDK is unfortunately not very informative
> when it comes to error messages and doesn't have much of error handling
> either.
>
> > Hashtable affyProbeLookup = new Hashtable();
> >
> > ItemResultList<ArrayDesign> affyList = affyQuery.list(dc);
> > for (ArrayDesign ad : affyList) {
> > int adId = ad.getId();
> > FusionCDFData cdf =
> > Affymetrix.loadCdfFile(Affymetrix.getCdfFile(ad));
> > if (cdf == null) continue;
> > int numProbesets = cdf.getHeader().getNumProbeSets();
> > int index = 0;
> > while (index < numProbesets) {
> > String probesetId = cdf.getProbeSetName(index); // Line 96
> > affyProbeLookup.put(probesetId, adId);
> > index++;
> > }
> > }
> >
> > Exception in thread "main" java.lang.NullPointerException
> > at affymetrix.gcos.cdf.CDFFileData.getProbeSetName(Unknown Source)
> > at affymetrix.fusion.cdf.FusionCDFData.getProbeSetName(Unknown
> > Source)
> > at base_api_test.run_test(base_api_test.java:96)
> > at base_api_test.main(base_api_test.java:216)
> >
> >
> > note: the cdf object seems OK (numProbesets is set properly).
> >
> >
> > If anyone can point me in the right direction, it would be appreciated.
> > It's a bit difficult not having the Affy source and line numbers
> > (is that available?)
>
> You'll have to ask Affymetrix for that. It all depends on how the
> package was compiled.
>
> I also have to mention that the 2.5 release will have a lot of changes
> in how Affymetrix data is handled.
>
> First, the special case used for Affymetrix has been replaced with a
> more generic way to support file attachements to any raw data types.
> Most of the methods in the Affymetrix class have been deprecated and
> replaced with something else (there are hints in the javadoc).
>
> Second, BASE 2.5 will store CEL, CDF and other large files in a
> compressed format. To avoid having to unpack and copy the CEL and CDF
> files each time we just want to read the first 10-20 header lines BASE
> 2.5 will ship with a modified version of the Fusion SDK. The
> modifications have made it possible to pass a java.io.InputStream to the
> Fusion SDK instead of filenames (java.io.File). This may make it behave
> a bit differently than it did before. We have only modified the parts
> that we are using in BASE. Other parts have been left as they are. If
> you are only doing similar things as we do in the Affymetrix.java class
> it should be safe. The modified Fusion SDK can be found on
> http://trac.thep.lu.se/trac/basehacks
>
> /Nicklas
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems? Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> basedb-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/basedb-devel
--
Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups |
Division of Cell and Molecular Biology | Imperial College London |
Phone +442075941945 | Email [EMAIL PROTECTED]
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
basedb-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/basedb-devel