[R] Analogues to my data and prediction problem

Ben Harrison Sun, 25 Aug 2013 22:53:47 -0700

Hello, I am quite a novice when it comes to predictive modelling, sowould like to see where my particular problem might lie in the spectrumof problems that you collectively have seen in your experiences.

Background: I have been handed a piece of software that uses a kohonenSOM network to analyse and predict data with missing values common, butI want to compare its results to other forms of modelling and prediction(e.g. multi-layer perceptrons, random forests??).

My data is a conglomeration of borehole data from hundreds of boreholes.Some measurements were made during the drilling of the boreholes (moreor less continuous 'tool responses': geophysical well-logs), and some inthe laboratory on discrete samples of 10 cm up to metre-length scales.

The data could be considered ordered series to some extent, thoughchanges in rock types with depth can result in 'step' changes in toolresponses.

My problem is not classifying the rocks, but modelling and predicting aphysical attribute of the rocks---thermal conductivity, which is a labmeasurement, and hard to come by / expensive. I want to use the morecommon well-log responses to predict this attribute.

Some boreholes have different sets of well-log data though. For example,one might have measurements from the A and B tool, while another mighthave A, B, and C tools, and a third the B and C tools. I can construct adecent data base of about 70,000 observations of a common set of 5 toolresponses, and they have associated with them about 100 measurements ofthermal conductivity. I am mostly confident that the relationship ofwell-log responses is non-linear to thermal conductivity. Linearregression has not proven accurate.


What 'sort' of problem is this?

Have you seen problems like this, and what did you use to solve it?

I have papers by people using other ANN type techniques (MLP inparticular) to model and predict thermal conductivity, but wondered ifthere was something else I could try.


Some other questions I would like a little guidance on:

Are 100 samples enough of the 'target' attribute for confident modellingand prediction?

How would I quantify the certainty of results of modelling?

The well-log data is extensive, but if I look at the complete set oftool responses, there is a LOT of missing data (because there is nocommon tool set). Is there a way I can still use the less common toolresponses?Is discretisation of the 100 measured thermal conductivities a sillyidea? How many 'bins' can I construct?


Thanks for reading!
Ben.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Analogues to my data and prediction problem

Reply via email to